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0. INTRODUCTION 

These notes have been prepared for students studying the Part II (C) course 
Automata and Formal Languages. They serve as the backbone of the course. As 
such, they contain all the essential definitions, theorems, and proofs. However, 
they do not form the body of the course; that is best obtained from attending 
the lectures. In particular, there are very few examples in these notes. The 
reason that they have been omitted is because examples should be built and 
worked through in front of the reader for maximum effect. Merely seeing the 
final product does not impart the same intuition and insight as seeing the ex¬ 
ample being worked through in real time. With this in mind, students are 
encouraged to keep a copy of the notes on-hand to ensure that they have tran¬ 
scribed definitions and theorems correctly, but not to learn from. These notes 
are a memory aid, not a learning tool. Treat them as such. 

These notes contain material on all three sections of the course. To remove 
any ambiguity, the course content (i.e., examinable material) will be hereby 
defined as the material that is lectured. 

For those of you who are reading this in digital format, you can click on items 
in the table of contents to go directly to that part in the text. Also, when you 
click on a definition number, you will go directly to the statement. The same 
applies for lemmata, theorems, etc. 

For those of you reading this in printed format, yet yearning to enter the digi¬ 
tal age, the pdf notes reside in an encrypted container called Automata2018. tc, 
which can be found by following the teaching link from my homepage at 
https://www.dpmms.cam.ac.uk/~mcc56/ 

To open the container, you’ll need to use the cryptographic software TrueCrypt 
or VeraCrypt (the latter might be better for Mac users; if you do use VeraCrypt, 
then you need to select ‘TrueCrypt mode’ when decrypting). Downloads for 
most operating systems, and instructions, can be found at 

https://www.grc.com/misc/truecrypt/truecrypt.htm 
https://www.veracrypt.fr/en/Home.html 
You will also need the following password 1 to open the container: 

HandTakeAlsoFreight2018CableNorthCloud 


1 Or a lot of spare CPU power. 
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0.1. List of reference texts. 

Each of the three sections of this course was designed from a different reference 
text 2 . These are: 

(1) Register machines and computability theory: 

P.T. Johnstone, Notes on logic and set theory (Chapter 4), Cambridge 
University Press, 1987. 

(2) Regular languages and finite-state automata: 

J.E. Hopcroft, R. Motwani and J.D. Ullrnan, Introduction to automata 
theory, languages and computation (Chapters 2-4), 2nd edn, Addison- 
Wesley, 2001. (Note that this is not the edition stated in the course 
description). 

(3) Pushdown automata and context-free languages: 

D.C. Kozen, Automata and computability (Lectures 19-25), Springer, 
1997. 

If you insist on obtaining only one text for the course then the book by Kozen 
will serve you best, but by no means fully. 

0.2. Shorthand conventions. 

Throughout the lectures I will make use of certain shorthand conventions. 
Some of these are standard, others are not. For clarity, here is a list of those 
which I will be using. 

b/c, = because 

c/f = comes from 

w/ = with 

w/o = without 

wts = want to show 

wlog = without loss of generality 

thm = theorem 

defn = definition 

lem = lemma 

cor = corollary 

pf = proof 

eg = example 

ex = exercise 

RM = register machine 

PC = partial computable 

PR = partial recursive 

Prim R = primitive recursive 

s.t. = such that 

rec = recursive 

iff = if and only if 


2 Because life is never as straightforward as it should be. 
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1. REGISTER MACHINES AND COMPUTABILITY THEORY 

To explore what is incomputable, we first need a robust definition of what it 
means for a problem to be computable. There are many equivalent ways to do 
this; we present one of them here. 

1.1. Register machines and computable functions. 

Definition 1.1 (Register machines). 

A register machine consists of two parts: a sequence of registers R\, R 2 , ■ ■., 
and a finite program. 

A register is a place to hold, at any time, an arbitrary natural number (in¬ 
cluding 0). Think of these as ‘buckets’, each holding a single integer. 

A program (often denoted P) is defined by specifying a finite number of states 
S 0 , S\,..., S n , and, for (not necessarily all) i G {0..., n}, an instruction to be 
carried out when the machine is in state i. These instructions are of two types: 

(1) add 1 to register Rj and move to state Sk (written Si : (j, +, k)); 

(2) test whether Rj holds the integer 0: if it does, move to state S); other¬ 
wise, subtract 1 from it and move to state Sk (written Si : ( j,—,k,l )). 

The input of a register machine is a finite (ordered) set of registers (R.\,..., R n ), 
each containing a non-negative integer, and by convention we set all other reg¬ 
isters {Rj}j> n t° contain 0. 

S 1 is the initial state, and it is from this state that we begin applying the 
program to the registers. So is the terminal state; upon reaching it, the machine 
ceases to operate, so there is no need to have an instruction associated to So- 
The machine is permitted to move to a (non-terminal) state with no instruc¬ 
tion associated to it (written Si : 0). If this occurs, the machine simply sits in 
limbo forever; neither terminating, nor performing further computation. 

Of course, a register machine can only ever change the entries in finitely many 
of the registers, as there are only finitely many states in a program, and each 
state can only modify at most one specific register. We continue to use an 
infinite sequence of registers R\, R- 2 , ■ • • to save having to specify how many are 
needed. 

We can describe a register machine in many ways. Two such ways, which we 
give here, are via a sequence of instructions or a program diagram. 

Definition 1.2 (Sequence of instructions). 

A sequence of instructions for a register machine with program P is simply 
the collection of instructions of P, written Si : ( j,+,k ) or Si : ( j,—,k,l ). By 
writing out the triples (j, +, k) and quadruples ( j,—,k,l ) in an ordered list, is 
it assumed that Si corresponds to the first instruction in the list, S 2 to the 
second, and so on, thus we usually omit the prefix Si on each instruction. This 
completely describes the register machine with program P. 

Definition 1.3 (Program diagrams). 

A program diagram for a register machine with program P is a graph T with 
directed edges (some of which are labelled) and labelled vertices. The vertex 
set of T consists of the states of P. We then generate edges as follows: For each 
instruction of type (1) (when in state Si, add 1 to register Rj and move to state 
Sk), we include a directed edge from Si to Sk, with label Rj + 1. That is, 
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For each instruction of type (2) (when in state S t , test whether Rj holds the 
integer 0: if it does, move to state Sf, otherwise, subtract 1 from it and move 
to state S'*,), we include two edges. One is a directed edge from S t to Sk with 
label Rj — 1. The other is an unlabelled directed edge (often written as a dotted 
edge) from S) to Si. That is, 

O O Fl d ri 

Ol — y Ok 

This completely describes the register machine with program P. 

Notice that we have not specified the registers at all in the above two def¬ 
initions. This is because they are not needed in order to define the register 
machine. We give the machine an input (a finite sequence of ‘filled’ registers), 
and the rest are set to 0 by default. However, the machine itself does not 
care which registers we pre-set to non-0 entries. The machine could, in fact, 
commence will all registers set to 0 (so with no external input). 

Each of the above two definitions can be used to completely describe a register 
machine. A program diagram is more intuitive, but harder to write down. A 
sequence of instructions is easy to write down, but difficult to follow. From one 
such description, we can always convert to the other. 

We have specified an initial state Si and a final state So in our register 
machines. However, it might be the case that the machine never reaches the 
final state So', an easy example of this is a program with two states So, Si, and 
the instruction for Si is ‘add 1 to register R.\ , then move into state Si again’ 
(i.e., with sequence of instructions Si : (1,+, 1)). This machine just keeps 
adding 1 to register R.\, and never reaches state So- The key idea here is that 
we care about the inputs on which the register machine eventually reaches So- 

Definition 1.4 (Halting sets). 

A register machine with program P is said to halt on input (mi,..., mk) € N fc 
if, when given an input of registers (R,\ ,..., Rk) with each Ri holding integer 
nii (and all others holding 0), the machine eventually reaches state So after 
application of finitely many instructions. We write this as P(mi,... ,m,k) 
The halting set of P, written H(P), is the set of inputs on which P halts. That 
is, 

M(P) := |J {(mi,...,m k ) € | P(mi,...,m k ) j} 

k> 0 

If P does not halt on input (mi,..., mk), then we write P(mi, ..., mk) t- 
Definition 1.5 (Upper register of a program). 

There will be some finite k, for each program P, for which all registers after 
the k th are ‘ignored’ (or ‘not touched’) by P. That is, if we set 

upper(P) := max{;< € N | (i,+,j) or ( i,—,j,l) is in P} 

then no register Rj with index j > upper(P) will ever be modified by P. We 
call this index upper (P), the upper register index of P. 

So each program P has a maximum number of input registers it can process. 
Of course, we want our machines to actually take an input and give an output; 
that is, we want them to compute something. 

Definition 1.6 (Partial computable functions). 

A partial function / : —>• N is said to be partial computable by a program 
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P if, for all (mi,..., m^) € N A: where /(mi,..., m. k ) is defined, we have that 
P(m i,..., rn k ) j, with /(mi,..., m k ) in register i?i when it halts. 

We allow our definition to include partial functions (those defined on a subset 
of N fc ). When this happens, we require that P{m \,... ,m k ) t for the inputs 
on which /(mi,..., m k ) is undefined. Thus, every program P for a register 
machine defines an n-variable partial function for each n > 0 (though different 
programs can define the same function). 

We now introduce some basic techniques for register machine operations. We 
will use these several times in later proofs, so it helps to establish their existence 
now. 

Lemma 1.7 (Addition of registers). We can write a program P (or a subroutine 
of a program) to add the contents of Ri to Rj , leaving Rj unchanged at the end. 

Proof. A program that does this is given by the sequence of instructions: 

Si : (z, —,2,4) 

S -2 : (n, +, 3) where n is larger than both i and j 

Ss:(j,+, 1) 

5 4 : (n, —,5,0) 

5 5 ■ (i, +, 4) 

If instead we were wanting to add the above subroutine to a program P, we 
would need to choose n to be larger than both the upper register index for P , 
and the largest input register for P. □ 

Note that, in the above proof, we have specified which instruction corresponds 
to which state. Usually we just write out the instructions in a list, and take 
the ?' th instruction in the list to correspond to state Sj. 

The process described in Lemma 1.7 would be much easier if we just wanted 
to transfer the contents of Rj to Rj (that is, emptying Rj in the process). We 
leave this as an exercise. 

Lemma 1.8 (Emptying registers). We can write a program P (or a subroutine 
of a program) to empty the register Rj. 

Proof. A program that does this is given by the sequence of instructions: 

Si :(*,-, 1,0). □ 

1.2. Partial recursive functions. 

We can now give a large class of functions which are partial computable. 

Theorem 1.9 (Closure properties of partial computable functions). 

a) (Basic functions) For each i < k, the projection function (m,... ,n k ) nj 
is partial computable. 

b) (Basic functions) The constant function with value 0 (that is, n 0), and 
the successor function ?mn + l, are partial computable. 

c) (Composition) If f is a partial computable function on k variables, and 
gi,... ,gk are partial computable functions each on l variables, then the function 
h on l variables given by 

h(ni, ...,ni):= f(gi(ni,... ,n z ),.. ...%■(/q, ..»/)) 

is also partial computable. Note that we take h as being defined on (ni,... ,ni) 
when each gj is defined on (n \,..., n{) and / is defined on (gi(n±,..., nf ),..., 
gk(ni,... ,ni )). 
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d) (Recursion) If f and g are partial computable functions on k and k + 2 
variables respectively, then the following function h on k + 1 variables defined 
inductively by 

h(ni ,..., n k , 0) := f(m,..., n k ) 

h{n \,..., n k , n k j r \ -f 1) . g(pi > • • • i f^ki ^fc+i> h(n \,..., n k , n k +\)) 


is partial computable (and defined if and only if f and g are appropriately 
defined). 

e) (Minimalisation) If f is a partial computable function on k + 1 variables, 
then the following partial function g on k variables defined by 


g( n u • • 



n if /(m,..., n k , n) = 0 and /(m,..., n k , m) > 0 Vm < n 
undefined otherwise 


is partial computable. 


Parts a) and b) of this theorem say that the basic functions are partial 
computable, part c) says that partial computable functions are closed under 
composition , part d) says that partial computable functions are closed under 
recursion, and part e) says that partial computable functions are closed under 
minimalisation. 

Our proof will make use of the subroutines given in Lemmata 1.7 and 1.8, so 
we won’t write them out again explicitly. 


Proof. 

a) For i = 1, the projection function can be computed by a program which 
does nothing to register R\, and enters the halt state immediately. For example, 
the 1-line program (2, +,0) will suffice. For i > 1, we take the program which 
first empties R\ , then transfers R to R\ , then halts. For example, the following 
program will suffice: (1, —, 1, 2), ( i , —, 3,0), (1, +, 2). 

b) These two functions can be computed by the one-line programs (1, —, 1,0) 
and (l,+,0) respectively. 

c) We construct a program to compute h as follows. First, let M\ be the 
supremum of the upper register indices of the gj's and of /. Now set M = 
M\ + k + l (so none of the registers after R\f will ever be non-zero). Let 
n = (k + 1 )M, and now transfer the contents of R\, ..., Ri to R n + 1 ,..., R n +i 
respectively, setting each R.\,..., R,{ to 0 in the process; this is to ‘store them 
safely’ so that they are not affected by the rest of our computation. Now, for 
each 1 < i < k, we copy (without deletion) the registers R, n + i , R n +2 • • •, R n +i to 
RiM+i, RiM+2, ■ ■ ■, RiM+u then perform the computation of g, with all registers 
shifted iM places to the right, then store the output in R , (Observe that M was 
chosen sufficiently large so that at no time in the computation will we reach, or 
pass, register Ru+i)m- Moreover, the computation will not touch any register 
below RiM+i, as the original computation never tries to touch a register below 
Ri). Note that, starting with registers containing (n i ,... ,ni, 0,...), we now 
have registers with contents 

M — k 0’s 

(Sl(ni, • • • ,ni),.. -,g k (ni,... ,n t ), 0,... ,0, other entries ) 

Finally, perform the computation for /. Note that upper(/) < M\, and both 
Mi, l < M. Thus the entries beyond register Rm will be irrelevant to the com¬ 
putation of /. So, after computing / we will be left with f(gi(ni ,..., n {),..., 
g k (n\,..., ni)) in Ri, as required. 
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d) This is similar to the previous case. We first copy (without deletion) the 
entries in registers Hi,..., Rk+i to R n+ 1 ,..., R n+k+ u with n chosen to ‘store 
these entries safely’ as follows: let Mi be the supremum of the upper register 
indices of / and g, and set n = M\ + k + 2. Now, set register Rk+i to 0, 
run the computation for /, take the output (the contents of R\) and copy it 
to register R n +k+ 2 - (*) Subtract 1 from register R n+k+ i (which counts the 
number of steps of the recursion remaining). Now set registers Hi,..., R n to 
0, then copy R n +i, ■ .., R n +k t° Ri,...,R k , copy R n+k+ 3 (which counts the 
number of steps j done so far in the recursion) to R k +i, and copy R n + k + 2 (the 
current value h[n\,... ,n k ,j)) to Rk+ 2 - Now run g, to end up with the value 
g(ni,...,n k ,j,h(ni,...,n k ,j)) = h(ni ,..., n k ,j+l) in R\. Empty R n +k +2 and 
replace it with the contents of R \, and add one to register R n+k+ 3 . Then go back 
to (*) and repeat. If, however, R n+k+ 1 was already 0 when we tried to subtract 
1 from it, then instead just enter state So; R\ will contain h(ni,... ,n k + 1 ) at 
this point. 

e) This is similar to the previous case. We first copy the entries in registers 

R \,..., R k to R n+ 1 ,..., R n +k , with n chosen to ‘store these entries safely’ as fol¬ 
lows: set n = upper(/)+A:+l. Now enter a subroutine where we empty the regis¬ 
ters ili,, -Rjfc+i and then replace them with the contents of R n + 1 ,..., R n+k +1 
respectively, and then perform the computation of /. At the end of this sub¬ 
routine, we obey (1, — ,j ,/); from Sj we add 1 to R n + k +1 and return to the 
beginning of the subroutine; from Sj> we empty R\ and then copy R n+k +1 to 
ill, and then halt. □ 

In fact, by using the above functions, we can define an interesting class: 
Definition 1.10 (Partial recursive functions). 

We define the class of partial recursive functions as the smallest class of partial 
functions from N” —>• N (for all n) which is closed under the properties of 
Theorem 1.9. That is, such a function / can be constructed from the basic 
functions and applications of composition, recursion, and minimalisation a finite 
number of times. If / can be constructed without minimalisation, then we say 
that it is primitive recursive. 

Lemma 1.11. Every primitive recursive function f : N n —>• N is total (that is, 
defined on all ofN n ). 

Proof. The projection, constant, and successor functions are obviously total. 
The composition of total functions is again total. Finally, performing primitive 
recursion on total functions is again total. □ 

We point out (without proof) that not every total recursive function is prim¬ 
itive recursive; the Ackermann function 3 is one such example. 

Example 1.12. Addition and multiplication of integers are both primitive re¬ 
cursive functions. 

Proof. For addition, we can use the recursive definition 
h(n\, 0) := n\ 

/i(ni,n 2 + 1) := g{ni,n2,h{ni,ri2)) = h(ni,n 2 ) + 1 


very interesting function, which we do not have time to cover. 
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where g(ni, n 2 , h(ni, u, 2 )) := h(ni,n 2 ) + 1 is projection onto the 3rd factor, 
followed by successor. Thus we see that h is primitive recursive, and h(ni, ra 2 ) = 
n\ + n 2 . 

For multiplication, we can use the recursive definition 
h(ni, 0 ) := 0 

h(ni,ri 2 + 1) := #(ni,n 2 , /i(ni,n 2 )) = /i(ni,n 2 ) + n\ 

where g(n\, n 2 , /i(ni, n 2 )) := h(ni,n 2 ) + ni is projection onto the 3rd factor, 
followed by addition of ni. Thus we see that h is primitive recursive, and 
h(ni,n 2 ) = ni * n 2 . □ 

Example 1.13. (ni,n 2 ) i-A nf 2 is primitive recursive. 

Proof. We can use the recursive dehnition 

h(ni, 0 ) := 1 

h(ni,ri 2 + 1) := g{n\, n 2 , h[n\, n 2 )) = /i(ni,n 2 ) * ni 

where g(ni, n 2 , h(ni, n 2 )) := h(ni,n 2 ) * ni is projection onto the 3rd factor, 
followed by multiplication by ni. Thus we see that h is primitive recursive, and 
/i(ni, n 2 ) = n” 2 . □ 

Theorem 1.9 shows that all partial recursive functions are partial computable. 
We will soon show the converse. First, we need to introduce the following 
notation. 


Definition 1.14. Let n > 0 and i > 0. We write p, for the (i + l) th prime (so 
po = 2), and we write (n)i for the largest power of pi which divides n. 

Lemma 1.15. The function (•)$ : N -A N given above, sending n eA (n)j if 
n > 0 (and 0 eA 0 ), is primitive recursive, for each fixed i > 0. 

The proof of this result involves showing that several intermediate functions 
are primitive recursive. Many of these are useful in their own right, and some 
will be used explicitly in later proofs. 


Proof. We build this up in stages. In each stage, we define a family of functions 
over all k > 1 (note that k is an index of these functions, not a variable within 
the functions). We show that each individual function in each family is primitive 
recursive. 


(1) Step functions: 

step fc (n) 


1 if 0 < n < A; — 2 

0 if n > k — 2 


We prove this inductively on k. First, we show step 2 is primitive recur¬ 
sive. This follows from the fact that we can define step 2 via recursion 
by 


step 2 (0) := 1 
step 2 (n + 1) := 0 


Now assume step^ is primitive recursive for all j < k. We can define 
step fc via recursion by: 

stepfc(0) := 1 

step fc (n+ 1) := step fc _ 1 (n) 
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(2) Delta functions (also defined for k = 0,1): 


4W = 


1 if n = k 
0 if n ^ k 


We can define 8 y.{n) via composition and product, using step*, by: 
4(n) := step* +2 (n) • step 2 (step* +2 (n + 1)) 

(3) Truncated successor functions: 


slope* (n) = 


n + 1 if 0 < n < A: — 2 
0 if n > k — 2 


We can define slope* via recursion, using step*, by: 
slope*; (0) := 1 

slope*(ro + 1) := step*(n + 1) • (slope*(n) + 1) 

(4) Remainder functions: 

rem*(n) = n mod k 

We can define rem*. via recursion, using slope*, by: 
rem*(0) := 0 

rem*(n + 1) := slope* (rem* (n)) 

(5) Floor functions: 

floor* (n) = ^ 

We can define floor* (n) via recursion, using <5o and rem*, by: 
floor* (0) := 0 

floor* (n + 1) := floor* (n) + 4(rem*(n + 1)) 

(6) Division functions: 

.... , . f t. if n = 0 mod k 

ivi e *(n) | q otherwise 

We can define divide*(n) via composition and product, using 5q, rem* 
and floor*, by: 

divide* (n) := floor* (n) • c>o(rem*(n)) 

(7) Division by powers: 

. , f 4 if n = 0 mod k m 


/ \ Urn i u — ' J uiuu ru 

P ° Wer ‘("' m ) = \ 0 otherwise 
We can define power* (n,m) via recursion, using divide*, by: 
power*(n, 0 ) := n 

power*(n, m + 1) := divide*(power*(n, m)) 

(8) Maximum powers dividing an integer: 


maxpow*(n) = 


the largest power of k dividing n, if n / 0 
0 if n = 0 
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We can define maxpow fc using 5q and power fc . First we define an auxil¬ 
iary function h by recursion via: 

h(n, 0) := 0 

h(n, m + 1) := h(n , m) + ho(<5o(power fc (n, m + 1))) 

Now we define rnaxpow^ via composition by: 

maxpow fc (n) := h(n,n) 

Observe that h(n , n) = 0+e)o(<5o( power fc (n, l)))+^o(^o( power fc (n, 2)))+ 
... + So(So( power k (n, n))), and that k n > n (as k > 1). Also observe 
that 

r f 1 if Ad divides n and n > 0 

*><*>(Power t (n,j))) = | Q otherwise 

Thus, for n > 0, we have h(n, n) = S"k, 1 = rri if rri > 0 (or 0 if m = 0), 
where m is the largest power of k dividing n. Moreover, h{ 0, 0) = 0. 
Finally, we take (n)i := maxpow p .(n), which is primitive recursive. □ 

1.3. Equivalence of partial recursive and partial computable func¬ 
tions. 

We showed in Theorem 1.9 that partial recursive functions are partial com¬ 
putable. Now we show that partial computable functions are partial recursive. 

Theorem 1.16. Every partial computable function is partial recursive. 

In this proof we make use of the functions defined in the proof of Lemma 
1.15. 


Proof. Let / : —>• N be a partial computable function on k variables, with 

program P. We define an auxiliary function g : Lf fc+2 -A N as follows: 

• g(ni ,..., nfc, 0, t ) is the number of the state of P reached after t computa¬ 
tional steps, starting at state S\ with input (ni,..., n^, 0,...). If P halts (i.e., 
reaches So) in fewer than t steps on this input, then we take this to be 0. 

• g(n \,..., nfc, i, t ) is the contents of register Ri after P has run t computa¬ 
tional steps, starting at state S\ with input (m,..., n^, 0,...). If P halts in 
fewer than t steps on this input, then we take the contents of Rj when P halted. 

Clearly g is total; we now show that it is actually primitive recursive. Set 
r := upper(F > ) + fc +1. Then g{n \,..., rik,i, t ) can only be non-zero if 0 < i < r. 
So, for each fixed ni,..., n^, t, we can express the values of g(n \,..., n*,, i, t ) 
for 0 < i < r by the finite sequence {go,.--,g r ) (where each gi depends on 
(ni,..., rik, t)) r and then we can code this to the integer 2 90 3 91 ■ ■ ■ pr r ■ Then 
this coding function c : (go ,..., g r ) i-a 2 90 3 91 • • • p 9 r r is primitive recursive (by 
Examples 1.12 and 1.13). Note that the function (-)j does the following to 
integers of the form 2 9o 3 ffl • • • pr r : 


(2 90 3 91 ■■■p 9 /) i 


9 i if 0 < i < r 
0 if i > r 


So (-)i is the component-wise inverse to c. That is, (c(go ,..., g r ))i = gi is 
projection onto the i + 1 entry. Moreover, (•)* is primitive recursive, as shown 
in Lemma 1.15. We now proceed to define g via primitive recursion on the last 
variable. First, we define a function h : N fc+2 —>• N via recursion, starting with 


h(no,...,n k ,0) = 2 n °3 ni ---p^ 
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So when given the index no of the initial state of P (which will be 1), as well as 
the contents (ni,..., n k ) of the first k registers, h(no ,..., n k , 0 ) is the integer 
which codes these (and is primitive recursive, just like c). Now we need a 
‘transition function’ for these coded integers to simulate the steps of P, which 
will give us our recursive definition of h. This will be a function which ‘computes 
one step of P’, and gives us the code for all the new register values and the new 
state. We call this function s : N -7 N, with s(2 n °3 ni • • • := 

2 n o3 n l • • -pr r Pr+ 1 1 '' -p 5 n m ( so n l is unchanged for all l > r), and where 

P if S no =7 (j, +, P) 

P if S no => Cb P, 7) and > 0 

n' 0 := 7 if S no =7 (j, -, P, 7 ) and nj = 0 

0 if no = 0 

n' 0 if S no =7- 0 (no instruction for S no ) 

(where ‘if S a =7- (j, +, /?)’ means ‘if the instruction for state S a in P is (j, +, P)\ 
and so on). And, for 1 < j< r, we have 

rij if S no =7- (/, +, P) or (l, -, p, 7 ), where l / j 
nj + 1 if S n o =7- (j, +, P) 
n'j := rij — 1 if S' no =7- (j, —, /3, 7 ) and > 0 

0 if 5 no =7- (j, -, P, 7 ) and rij = 0 

if =7- 0 (no instruction for S no ) 

Starting with a := 2 n °3 ni • • we first observe that we can compute, in a 
primitive recursive way, ■ ■ ■ p^ 1 as we have 

Pr+ 1 1 •■•Pm= Power pr (.. ,power pi ( power po (a, (a) 0 ), (a)i ) • • • (a) r ) 

Computing rij for (1 < j < r is primitive recursive, as it is just the func¬ 
tion (-)j. Moreover, computing n'j (for 0 < j < r) is also primitive recursive, 
because we have a finite number of ‘non-trivial exceptions’ in the definitions 
of n'j so we can use functions like 5}. and step fc to compute these. So we 
can compute n' 0 ,... ,n' r , and the product • • • p'?™■ Thus, as c is primi¬ 

tive recursive, we see that s(a) = 2 n o3 n i • • • Pr r Pr^\ • • -Pm m = c(n' 0 ,... ,n' r ) ■ 
power pr (...( power po (a, (a)o),... (a) r ) is also a primitive recursive function. 
Finally, we finish the recursive definition of h with 

h(n 0 ,..., n k , t + 1 ) := s(h(n 0 , ...,n k ,t)) 

So h(l,ni,... ,n k ,t ) is the coded integer giving the state and registers of pro¬ 
gram P, on input (n±,... ,n k ), after t computational steps. So we see that 
g : N fc+2 -7 N is primitive recursive, as 

r 

g(ni,... ,n k ,i,t ) = (h(l,n u .. .,n k ,t)) i = • (/i(l,ni,.. .,n k ,t)) j 

j =0 

Now let q(ni ,..., n k ) be the smallest t such that < 7 ( 711 ,..., n k , 0, t) = 0, if such a 
t exists (interpret this as the number of steps that P takes to reach the halting 
state). Thus q is defined via minimalisation (on the primitive recursive function 
g), and is therefore partial recursive. Then we see that the partial computable 
function / is given by 

f(n u ...,n k ) = g(m,.. .,n k , 1 ,q(n 1} .. .,n k )) 
which is partial recursive (but not necessarily primitive recursive). □ 
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From hereon, we will interchangeably refer to functions as ‘partial recursive’ 
or ‘partial computable’; there is no difference as the two classes of functions are 
the same. 


1.4. Algorithms and encodings. 

We now have a large class of functions, partial computable functions, de¬ 
finable both ‘mechanically’ and algebraically. It turns out that many of the 
functions we come across in mathematics are partial computable, or even total 
computable. These include: 

• Arithmetic functions (addition, subtraction, multiplication, division with re¬ 
mainder) . 

• Computing reductions mod n. 

• Primality testing. 

• Computing gcd and 1cm. 

These functions are, in a sense, the ‘nicest’ functions, from a computational 
standpoint. They are the ones that we can simulate via our computational 
device (register machines). 


Definition 1.17 (Recursive functions and recursive sets). 

A function / : N fc N is said to be recursive , or computable , if it is total 
recursive. A set X C N is said to be recursive, or computable , or decidable , if 
its characteristic function 


X x (n) 


1 if neX 
0 if n£X 


is a total recursive (= computable) function. This extends to subsets of N fc . 


Thus a function / is recursive (= computable) if we can always compute it, 
and a set X is recursive (= computable, decidable) if we can always compute 
whether or not a given integer (or tuple) lies in A. So such functions, and such 
sets, can be completely understood and ‘computed’ with register machines. 

We will now flip this idea, and say that register machines are the way to do 
computation. 


Definition 1.18 (Algorithms). 

An algorithm is any process which takes as input some recursive subset of 
N fc , and which can be simulated by a register machine. A total algorithm is 
one which will always terminate on every element in its input set. A partial 
algorithm is one which may fail to terminate on some elements in its input set. 


It is important that the input set of the algorithm is recursive, so that we can 
pre-test inputs to check that the algorithm can process them 4 5 . Usually (but 
not always), the algorithms we will describe will have as input set. Just 
as it is nonsensical to input the pair (1,2) into an algorithm which takes as 
input a single integer, we must also be careful not to ‘break’ our algorithms in 
other ways. For example, if we have an algorithm which takes as input a square 
number n, and outputs the square root of n, then we cannot input 5 into this 
algorithm. However, the set of square numbers is recursive. In general, we need 
to ensure that our input is suitable for the algorithm to start ‘working on’. 

4 Most of the time we are interested in deciding if an integer has a certain property or not, 
hence the alternate name decidable. 

5 A blender is an excellent machine for mincing food, but you wouldn’t want to put a brick 
in it. 
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It turns out that not all functions are computable. Moreover, not all functions 
are partial computable; we call such functions incomputable. 

Lemma 1.19 (Incomputable functions). 

There exists an incomputable function from N fc —> N, for each k > 1. 

Proof. Each partial computable function comes from a finite program of a regis¬ 
ter machine (alternatively, from a finite number of applications of composition, 
recursion and minimalisation with the finite set of basic functions). Thus there 
are at most countably many partial computable functions, yet there are un- 
countably many functions from —> N for any A; > 1. □ 

Observe that the above proof actually shows that there are uncountably many 
incomputable functions from N fc —>• N, for each k > 1. 

One problem with our definition of an algorithm is that it only takes in to 
account computations from to N. Thus, strictly speaking, we can’t consider 
the process ‘take a word in the English language, and compute the number of 
letters in it’ as a computable function; the input is not a A-tuple of integers. 
Similarly for the process ‘take an integer n, and compute the first word in the 
English dictionary with n letters’, as the output is not an integer. As we will 
soon see, we need ways to encode our inputs as /c-tuples, and our outputs as 
integers. We start with ways of encoding tuples as integers. 

It helps to have a notion of how to produce an ordered list of the elements 
of N m . There are many ways to do this; one such way is called the shortlex 
ordering. 

Definition 1.20 (Shortlex ordering). 

We define the shortlex ordering on N m as follows: (ni,..., n m ) < {n\ ,..., n' m ) 
if S(C j rii < E™ j n( or E™ 1 nj = Eand for some k we have n* = n[ for all 
1 < i < k but nk- i-i < n 'k+\ ■ 

We can use shortlex to produce an ‘indexed list’ of N m : Take all elements 
(ni, ..., n m ) with sum of entries S™ j rii = 0, and order these by shortlex. Then 
take all elements (ni,..., n m ) with sum of entries E^n,; = 1, and order these 
by shortlex. And so on. Thus, for each n € N, we can construct the n th element 
of N ,n in this list. This ‘indexing’ and its inverse (onto the i th component for 
each 1 < i < m) are all computable; we will show how in the next section with 
the aid of Church’s thesis. 

We can use this idea to encode words as integers. Consider the set E* of all 
words over the finite alphabet E. By placing an ordering {a i,..., a n } on E, we 
can represent each letter cq of E by the integer i. By restricting the shortlex 
ordering to {1,... ,n} m for each m, we get an induced ordering of E m (words 
of length to) for each to.: given a word w € E m , we can associate to it an 
TO-tuple (ii,..., i m ) representing the sequence of letters in w, and then we use 
the shortlex ordering on these associated tuples. Now, to produce an indexed 
list of E*, we first take all words in E and order them (via their tuples) by the 
induced shortlex. Then take all words in E 2 and order them by the induced 
shortlex, and then E 3 , and so on. Thus, for each n € N, we can construct the 
n th element of E* in this list. 

Using this, we may re-interpret our previous question of ‘take a word in the 
English language, and compute the number of letters in it’ as ‘take the index 
for a word in the English language, and compute the number of letters in it’. 
So it now makes sense to ask ‘is this function computable?’ We will always 
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require our inputs/outputs to be integers. Thus, 

From hereon we will take it as a given that the inputs/outputs of our al¬ 
gorithms are given by codes of various objects, either by explicitly giving an 
encoding, or implicitly without going in to the details. 


So when we ask for “an algorithm to count the number of letters in a word”, 
it is clear what we mean. 

We now give a way of encoding machines as integers, as later we will want to 
compute things about machines. There are various ways to uniformly encode 
programs for register machines as integers. Some of these are bijective (one 
program -H- one integer). We give an encoding here which is not bijective (in 
particular, there are integers which do not correspond to programs). 

Definition 1.21 (Encoding programs as natural numbers). 

For each line in the register machine program Q (a triple or quadruple) corre¬ 
sponding to state Si (1 < i < r; we disregard So), we take the triple (j, +,k) 
and encode it to the integer 2- ? • 5 fc , or the quadruple (j, —, k, l ) and encode it to 
the integer 2 J • 3 • 5 k ■ 7 l . Call this integer fi. Now take this sequence t\,... ,t r 
and form the integer n = 3* 1 • • • p/. Write P n for the program encoded by the 
integer n. 

Given that a register machine program P has no intrinsic arity (that is, has 
no intrinsic ‘number of variables’ which it needs to take as input), we see that 
if P computes a A;-'variable function / then it also computes the (k — Invariable 
function f given by 

f'(ni, ■ ■ -,n k - i) := f(n i,... ,n fc _i, 0 ) 

by simply inputting (m,..., n k ~ i, 0) into P. We now remove this ambiguity: 
Definition 1.22 (Functions from register machines). 

We write f nk for the A-variable function computed by the register machine with 
program P n , if P n exists (that is, if n actually encodes a program). 


We can now adapt Cantor’s diagonal argument to construct an explicit func¬ 
tion which is not partial recursive: 


Lemma 1.23 (An explicit function which is not partial recursive). 
Define the function g : N —>• N via 


j fn, i(n) + 1 if n codes a program and f n ,i{n) is defined 
( 0 otherwise 


Then this is an explicit definition of a function which is not partial recursive. 


Proof. We proceed by contradiction. Suppose g were partial recursive. Then 
there must be some code N for which g = /n,i- Now observe what happens if 
we try and compute g(N). We see that, as N is a code, if /v.i (A r ) were defined 
then we would have g(N) = /jv,i(IV) + 1 / /jv,i(-W) = g(N). Thus /iv, i(AT) 
is not defined. So by the definition of g we have g(N) = 0, thus giving that 
fN,i(N) = 0, and so /jv,i(iV) is defined; a contradiction. □ 

Note that we need the clause ‘ g{n ) = 0 if f n> i(n) is undefined’. If instead 
we had that L g{n) is undefined if f n ,i(n) is undefined’, then g would indeed be 
partial recursive, and we will prove this later in Lemma 1.24. To understand 
why this is true, we first need to introduce Church’s thesis. 



PART II AUTOMATA AND FORMAL LANGUAGES 


17 


1.5. Church’s thesis. 

We have given our definition of an algorithm in the previous section, in terms 
of register machines (and, equivalently, partial recursive functions). But this 
was not simply an arbitrary definition; it reflects some of our intuition of what 
an algorithm should do. We could, for example, have said that an algorithm is 
something that can be computed by a linear function. But it should be clear 
that this is far to restrictive a definition, and does not capture all the properties 
that we would want ‘algorithms’ to exhibit. 

The intuitive idea we have tried to reflect when giving the definition of an 
algorithm is the following: an executable process , in the intuitive sense, is a step- 
by-step deterministic process with a finite description at every step, a finite set 
of rules, and finite input/output. This, of course, is not a formal definition, 
and so is unsatisfactory to mathematicians. But it is the idea that we want to 
axiomatise, by defining ‘algorithms’ in a suitable way. 

We have seen that two seemingly independent definitions of ‘functions we can 
compute’, namely partial recursive functions and partial computable functions, 
actually yield the same set of functions. Alonzo Church 6 made the assertion 
(known as Church’s thesis ) that any abstract theory of finite computation (i.e., 
a theory of computation whose processes are executable processes, in the above 
intuitive sense) will always yield a set of partial computable functions which 
is contained in the set of partial recursive functions defined in Definition 1.10. 
Later in the course we will see theories of finite computation which define a 
strictly smaller set of partial computable functions. Church himself defined 
finite computation via the A-calculus, whose set of partial computable functions 
is also identical to the set of partial recursive functions. 

That is to say, Church asserted that the definition of an algorithm from 
Definition 1.18 is indeed the ‘correct’ definition to take, as it most accurately 
represents our intuitive understanding'. 

Of course, we cannot prove that all theories of finite computation lead to the 
same set of partial computable functions, as we don’t know them all! However, 
every abstract theory of finite computation which has been proposed so far has 
been verified (mathematically) to compute at most the set of partial recursive 
functions (and this includes quantum computers; they are faster , but not better , 
than existing computing machines 8 ). 

So we now state the first (of three) parts of Church’s thesis. Think of this as 
the “definition” part, where we have argued (philosophically) that our definition 
of an algorithm accurately reflects our intuitive understanding. 

Church’s thesis 1. 

Aiiy abstract theory of finite computation C will give at most the set of partial 
recursive functions as its set of C partial computable functions from to 
N. Thus the most powerful theory of finite computation is given by register 
machines and their many equivalents. 

That is, the definition of an algorithm from Definition 1.18 is the ‘correct’ 
definition to work with. 

6 Alan Turing made the same assertion, hence this is often referred to as the Church-Turing 
thesis. For brevity, we shall continue to call it Church’s thesis. 

''"just like the definition of a continuous function; something that was the subject of (100 
years of) debate, but is accepted now as the ‘correct’ definition to work with. 

O 

They still use finitely-many bits in their computation, so are a high-tech variation on a 
classical theme, in the same way that a nuclear reactor is a high-tech kettle. 
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Next, consider the encoding of register machines as natural numbers, as done 
in Definition 1.21. As mentioned, this encoding is not bijective; if integer n 
encodes a program P n then (n)j can only have prime divisors 2, 3, 5, 7. But 
we can describe an executable process to determine if an integer represents a 
program; the reverse of Definition 1.21 where we break down n into products 
of distinct prime powers, then break down those powers and make sure they 
only have prime factors 2, 3, 5, 7 (where the power of 3 is at most 1). A long 
and tedious exercise would be to verify, either by constructing a suitable register 
machine, or by composing suitable partial recursive functions, that the following 
(characteristic) function / : N —> N given by: 

,.. . J 1 if n codes a program 

■ ' 1 0 otherwise 

is indeed a computable function. 

But even without seeing a full proof that such a register machine / total 
recursive function exists, you probably already have a good idea of how one 
might go about building such an algorithm, from the description of it given 
above: “Break down n into products of distinct prime powers, then break down 
those powers and make sure they only have prime factors 2,3, 5, 7 , where the 
power of 3 is at most 1.” In fact, you may already feel that such a description 
would actually be sufficient proof that there is an algorithm to do it; you don’t 
need to see it written out in complete detail. 

Well, Alonzo Church felt the same way. And so this brings us to the second 
part of Church’s thesis; the idea that having a full step-by-step description of 
an executable process is sufficient proof that there exists an algorithm to carry 
out the process (given a suitable encoding). Think of this as the “my arguments 
should be enough to convince you” part of Church’s thesis. 

Church’s thesis 2. 

Any informal written description of a step-by-step deterministic process with a 
finite description at every step, a finite set of rules, and finite input/output, 
starting with some tuple in and with only integer output, is equivalent to 
some register machine computation. 

This may seem counter-intuitive at first; how can a description of an ex¬ 
ecutable process be enough to prove that there is an algorithm (= register 
machine) that performs the computation? Surely such a ‘proof’ is not rigorous, 
as it is simply a convincing argument. But most ‘proofs’ that we see in mathe¬ 
matics are exactly that: a convincing argument. Unless you’re in the habit of 
doing all your proofs via predicate logic 9 , they are no more than ‘convincing 
arguments’. So that is what you have seen here. The description of the process 
should be enough to convince you that it can (with some effort) be simulated 
by a register machine. 

But we can say even more than this. Thinking back to the example above 
again, of checking if an integer n encodes a register machine, we see that the 
verbal/written description given doesn’t just show that process can be made 
algorithmic; it describes what the algorithm does, in sufficient detail to allow 
us to write down the register machine program explicitly (albeit with a bit of 
effort). In the same way that we encoded words over alphabet S as integers, we 

®Do not do all your proofs via predicate logic, unless you want to spend the next 1000 
years completing tripos. 
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can also encode all finite phrases in the English language as integers (we omit 
the details here; it is very similar to the previous example). Thus we can now 
state the last part of Church’s thesis; think of this as the “If I can describe it, 
I can build it” part of Church’s thesis. 

Church’s thesis 3. 

There is a total algorithm (= recursive function h : N —>• N) that, given a code 
n for such a finite English description of a process as in part 2 of Church’s 
thesis, will produce a code h(n ) for a register machine Pu n ) that carries out the 
process so described. 

So what we’ve said here is that having a complete verbal/written description 
of an executable process is as good as having the actual register machine that 
carries out the computation; we can always recover one from the other in a 
computable manner. Thus, producing the explicit register machine is equivalent 
to producing a verbal/written description of the process. 

We summarise the three parts of Church’s thesis here. 

Church’s thesis. 

(1) Any abstract theory of finite computation C will give at most the set of 
partial recursive functions as its set of C partial computable functions 
from to N. Thus the most powerful theory of finite computation is 
given by register machines and their many equivalents. 

That is, the definition of an algorithm from Definition 1.18 is the ‘cor¬ 
rect ’ definition to work with. 

(2) Any informal written description of a step-by-step deterministic process 
with a finite description at every step, a finite set of rules, and finite in¬ 
put/output, starting with some tuple in and with only integer output, 
is equivalent to some register machine computation. 

(3) There is a total algorithm (= recursive function h : N —>• N) that, given 
a code n for such a finite English description of a process as in part 2 
of Church’s thesis, will produce a code h(n) for a register machine Pu n \ 
that carries out the process so described. 

Now we can start to appeal to Church’s thesis to show that certain functions 
are indeed partial recursive. First, in a slight abuse of notation, and making use 
of part 2 of Church’s thesis, when we say things like “We describe an algorithm 
to compute XYZ...”, we really mean “We describe a step-by-step deterministic 
process with a finite description at every step, a finite set of rules, and finite 
input/output, to compute XYZ...” Observe that, by part 2 of Church’s thesis 
and our previous discussion on encodings, we know that this is equivalent to 
having built a register machine to carry out the process. Thus, even though 
we have defined ‘algorithm’ in a very strict sense via register machines, we can 
interpret it more broadly now. 

We previously showed some of the following functions to be recursive: 

• Arithmetic functions (addition, subtraction, multiplication, division with re¬ 
mainder) . 

• Computing reductions mod n. 

• Primality testing. 

• Computing gcd and 1cm. 

All the above processes have deterministic step-by-step descriptions with all 
the necessary finiteness conditions, and so we can simply argue by Church’s 
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thesis that there exist register machines which can compute them. This is sub¬ 
stantially simpler than explicitly constructing the necessary register machines 
or partial recursive functions. 

Recall that we earlier stated, without proof, that recognising whether an 
integer represented a register machine code was computable. Well, now we can 
simply argue this by Church’s thesis. The reverse of Definition 1.21, where we 
break down n into products of distinct prime powers, then break down those 
powers and make sure they only have prime factors 2, 3, 5, 7 (where the power of 
3 is at most 1), is a step-by-step deterministic process with a finite description 
at every step, a finite set of rules, and finite input/output, starting with some 
integer and with only integer output. Thus, by Church’s thesis, we see that the 
following function 


, ._J 1 if n codes a program 

■ ' (0 otherwise 

is indeed a computable function. 

However, Church’s thesis cannot be applied to a description of a process if 
it contains an ‘existential step’. For example, consider the description ‘Take 
a register machine program P defining a function on 1 variable and, if it has 
a non-empty domain, output the smallest integer on which it halts’. Such an 
integer is, if it exists, well-defined. However, this does not translate into a 
step-by-step process for computation. 

We can now use Church’s thesis to clarify some of the claims we made in 
the previous section. For example: when we defined the shortlex ordering 
(Definition 1.20) to produce an indexed list of N m , we said that this indexing and 
its inverse (onto the i th component for each 1 < i < in) are all computable. This 
is now immediate; we gave a step-by-step deterministic process for producing 
the indexing, and thus by Church’s thesis we see that there is thus an algorithm 
(= register machine) which computes this, and so the indexing function and its 
rn inverses are all computable (and obviously total). 

Here is another example, which is a variant of Lemma 1.23. 


Lemma 1.24 (An explicit function which is partial recursive). 
Define the function g : N —> N via 


j fn, i( n ) + 1 if n codes a program and f n% i(n) is defined 
( undefined otherwise 


Then this give a function which is partial recursive. 


Proof. To see this, we again appeal to Church’s thesis: we have a description of 
a process to compute the values of g(n) when it is defined (that is, we simply 
check if n codes a program, and if it does, take the program P n and run it with 
input n ). If this process eventually terminates, then add one to R\ and output 
its contents. This is a description of a process to compute g, and so g is partial 
recursive (but not necessarily total). □ 

Note that, in the above example, we have used Church’s thesis to show that 
the function g is partial recursive. We have not shown that g is computable (= 
total recursive), and indeed g is definitely not total. Church’s thesis can be used 
to show that a function is partial recursive (= can be simulated by a register 
machine), but often we need to use some additional mathematical argument(s) 
to verify when such functions are total. 
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1.6. Recursively enumerable sets and diagonalisation. 

We have looked at recursive sets; those X for which the characteristic function 
Xx is total recursive. But what if we only had a partial recursive function / 
which, if defined, matches Xx, but whose domain of definition merely contains 
X (but not necessarily N \ X)1 That is, / can tell us if n € X, but doesn’t 
always say when n ^ X. 

Now that we know that the partial recursive functions are precisely the par¬ 
tial computable functions, we will write /(n) f to mean ‘/(n) is undefined’, 
and /(n) to mean ‘/(n) is defined’, matching the notion of when a partial 
computable function doesn’t/does halt. 

We now need the notion of diagonalising an algorithm. Suppose we have a 
partial recursive function / : N —>• N. Suppose also that we would like to know 
if /(7) is defined. Then we simply take a register machine with program P n 
which computes /, and start running P n ( 7). That is, we put 7 in register R{, 
then apply the instructions of P n step-by-step. If we reach the halting state 
So in some finite number of steps, then we stop and can conclude that /(7) 
is defined. In actual fact, we have just described an algorithm which takes as 
input a register machine P m for any code m and, if P m ( 7) |, will halt and 
confirm this (but will not halt if P m { 7) t) Thus, by Church’s thesis, there is 
some register machine with program Q which simulates this. That is, 

„, ( 1 if m codes a program and .P m (7) 1 

<3 < ro) = { t otherwise 

Now suppose we take the same partial function /, and we want to know if 
either /(7) or /(9) is defined. We could do as before, and take P n with input 
7 and run the register machine. If P n { 7) i then this process will halt and tell 
us ‘yes’. But what if P n ( 7) f but P n ( 9) j.? Then we would need to ‘wait until 
infinity’ for P n ( 7) to finish, before moving on to computing the steps of P n ( 9). 
What we need to do is to diagonalise the algorithms. That is, do one step of 
P n ( 7), then one step of P n (9), then another step of P n { 7), then another step of 
P n { 9), and so on. If either of these eventually halt (and say it is P n ( 9) after 1460 
steps), the our algorithm will terminate after 2 * 1460 steps with the answer 
‘yes’, which is what we want! It helps to write P^(k) to mean ‘the register 
machine P n , on input k, after t computational steps’ (this would be completely 
described by the integer h(l,k,t ) from the proof of Theorem 1.16 ). 

Here is a picture of the order of the computational steps in the process we 
described: 


^( 7 )--^( 9 ) 



P3(7) ->■ P^ (9) 


Now suppose we wanted to know ‘does / halt on any input?’ Again, we can 
diagonalise. But this time we need an infinite diagonal process, which explores 
infinitely many computational processes at once. To do this, we first need to 
do one step of P n { 1), then one step of P n (2 ) followed by one more step of 
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P n ( 1), then one step of P n { 3) followed by one more step of P n (2) followed by 
one more step of P n ( 1), and so on. This is a much larger diagonal process, 
but can still be simulated by one register machine as we have given a complete 
verbal description of the algorithm. Here is a diagram of the first 10 steps of 
this process. To make it clearer, we have written [t] next to the f th step of the 
process. 

[1] P n( l ) [2] ^(2) [4)^(3) [7] -Pn(4) • • • 



[3]P n 2 (l) [5]P n 2 (2) [8]P„ 2 (3)... 



[6] Pn(l) [9]P n 3 (2)-.. 



[10]P„ 4 (1)... 


In general, when we have several register machines with an input each, and 
we want to run them all at the same time until one of them halts, we can do 
so via diagonalisation. Moreover, this is an algorithmic process (we have given 
a full description of the algorithm, above). So by Church’s thesis there is a 
register machine to do this computation, and we can construct such a machine 
from the description given. 

Definition 1.25 (Recursively enumerable sets). 

A set E C N is said to be recursively enumerable 10 (abbreviated to r.e.) if the 
function <f>E defined by 

Mn) '= { t otherwise 

is partial recursive. 

So (f>E will always tell us if n € E, but won’t say anything about when n E. 
We will now show why recursively enumerable sets are named so; it because we 
can eniLmerate them in a recursive manner (that is, start a recursive process 
which eventually outputs each element in the set). 

Theorem 1.26 (Equivalent definitions of recursively enumerable sets). 

For a set E CN, the following are equivalent: 

a) E = {/ n ,fc(mi,..., TO*,) | (mi,..., nik) € N fc } for some fixed k > 1 and some 
fixed n. That is, E is the range of some partial recursive function on some 
number of variables. 

b) E = {m € N | f n , i(m) |} for some fixed n. That is, E is the domain of 
^Computability theorists sometimes call these computably enumerable, abbreviated to c. e. 
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definition of some partial recursive function on 1 variable, 
c) The function 4 >e defined by 


<t>E(n) 


0 if n € E 
t otherwise 


is partial recursive. That is, E is recursively enumerable, 
d) The function i^e defined by 


is partial recursive. 


ipE{n) 


n if n € E 
t otherwise 


Proof. 

( b ) => (c): Given a program P n for computing the function f n \ with domain 
E, we modify it by inserting an instruction which is ‘triggered’ before the halting 
state; this empties register R\ and then proceeds to the halting state. Explicitly, 
we add a state S' n +i (where So, ..., S n are the existing states). Then, for each 
instruction of the form (j, +, 0) or (j,—, k, 0) or (j, — ,0, l), we replace it with 
(j, +, n + 1) (resp. (j, —, k, n + 1), ( j , —, n + 1, If). Then we add the instruction 
(1, —,n + 1,0) for state S’n+i. 

(c) (d): Given a program P for computing 4>e, let r be the upper register 
index of P. Now insert a new state/instruction pair which copies R\ to R r +\ 
at the beginning of the computation, and another state/instruction pair which 
adds R r +1 to R\ just before the program reaches the halting state. 

(d) => (a): This is immediate. 

(a) =>• ( b ): Given the program P n for computing the function f n y- : N fc —>• N 
with range E, we describe the following algorithm Q. For each t S N, we start a 
diagonal process which starts computing f n p for all of its inputs (recall that we 
have an ordered listing of all the elements of N fc ). Each time f n k{ m i,..., mfi) 
halts in this diagonal process, compare the output to f; if we eventually find one 
such output is equal to t, then Q terminates on t and outputs 1 (if we never find 
such an output, then Q is undefined on t). As we have given a full description 
of the algorithm for Q , then by Church’s thesis we can find a register machine 
which computes Q, and thus a partial computable function whose domain is 
E. □ 


Since the definition of r.e. is one of the conditions above (c), then all the above 
conditions are equivalent to being r.e. We will make use of these conditions 
interchangeably in later proofs, depending on which is more convenient to work 
with at the time. 


1.7. Properties of recursively enumerable sets. 

There is actually a stronger form of Theorem 1.26 (a) when considering non¬ 
empty r.e. sets: 

Theorem 1.27. Let E C N be non-empty. Then E is r.e. if and only if it is 
the range of some total recursive function on some number of variables. 

Proof. By Theorem 1.26 (a), the range of a total recursive function will be 
r.e. and non-empty. So we need only show that, if E is r.e. and non-empty, 
then it is the range of some total recursive function. If E is finite, then it can 
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be written E = {e \,..., e k } for some k > 1. So we can define a total recursive 
function / : N —> N whose range is E by: 


fin) 


e n if n < k 
e k if n > k 


If E is infinite, then it is the domain of some partial recursive function g : N -A 
N. So we start a diagonal process which starts computing g(n ) for each n € N. 
Then we define a new function / as follows: we assign /(1) to be the first such n 
for which the diagonal process gives that g(n) halts, /(2) to be the second such 
n for which the diagonal process gives that g(n) halts, and so on. By Church’s 
thesis, we have defined a total recursive function / on 1 variable whose range 
is precisely E. 

Note that when we say ‘first such n' when defining /, we mean ‘when run¬ 
ning the diagonal process, the n for which the diagonal process give the first 
conclusive halting of g\ For example, we might have that g{ 5) and g(8 ) both 
halt, but in the diagonal process we see g{8 ) halt long before we see g( 5) halt; 
in this case we would have /(a) = 8 and f(b) = 5 for some pair a < b. □ 


Given condition (b) of Theorem 1.26, we see that each register machine pro¬ 
gram P n corresponds to a recursively enumerable set, and in particular this set 
is the domain of definition of f n _i . For simplicity, we will write the domain of 
definition of f n i from hereon as 

W n :={xeN | /„,i(x) j} 

We call W n the n th recursively enumerable set. Of course, this definition is only 
valid when n actually encodes a register machine program P n . 

It turns out that recursive sets and recursively enumerable sets are closely 
related: 


Theorem 1.28. Let E C N. Then E is recursive if and only if both E and 
N \E are recursively enumerable. 

Proof. If E is recursive then its characteristic function Xe is a total recursive 
function. Thus the functions <f>E and f>^\E from Definition 1.25 are partial 
recursive. This can be seen by defining an algorithm which takes an integer n 
and computes Xe(p)\ if this is 1 then the algorithm outputs 0, and if this is 0 
then the algorithm enters a non-terminating loop. By Church’s thesis, we have 
given a description of 4>e- which means that it is partial recursive. A similar 
argument works for 4 >^\e- 

Conversely, if both E and N \E are recursively enumerable, then <f>E and 
4>n\e are partial recursive. So, for each n € N, start a diagonal process which 
computes ^>£:(n) and <?^\F;( n ); if (and only if) </>#(n) halts, then Xe(ji) = 1, 
and if (and only if) </>pj\£(n) halts then Xe{u) = 0. Moreover, precisely one of 
these will halt for each n. As we have given a full description of an algorithm 
computing Xe, then by Church’s thesis it is a total recursive function, and 
hence E is recursive. □ 


There are many interesting examples of r.e. sets. 

Definition 1.29 (Diophantine sets). 

A set X C is Diophantine if there is an integer polynomial P on k + l 
variables such that 

X = {(ni,... ,n k ) € N fc | (3(mi,..., mi) G N l )(P(ni,.. .,n k ,mi, ...,mi) = 0)} 
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Lemma 1.30. Every Diophantine set is r.e. 

Proof. Let X C N fc be diophantine, with associated integer polynomial P on 
k + l variables. Then, given (m,..., n&) € N fc , we can start a diagonal process 
which starts computing P(m,..., n&, mi,..., mi) for every (mi,..., mi) € N* 
(recall that we have an ordered listing all elements of N l ). If any of these 
parallel processes ever terminate, then we say that (m,... , n^) € X. This is a 
complete description of an algorithm which halts iff (m,..., rik) € X, and thus 
by Church’s thesis we have that X is r.e. □ 


Surprisingly, the converse statement to the above theorem is also true. It is 
a deep result, posed as one of Hilbert’s famous problems from 1900 (his 10 th 
problem) and solved by Matijasevic, but is beyond the scope of this course. We 
state it here for completeness. 

Theorem 1.31 (Matijasevic’s theorem). Every r.e. set is Diophantine. 
Recursively enumerable sets satisfy some useful closure properties. 


Lemma 1.32. Let E C N be an r.e. set. Then the set C C E given by 

C(E) := {e € E | eisa code for a register machine } 

is also r.e. Moreover, given an index n such that W n = E, we can construct an 
index m such that W m = C(E). 


Proof. Recall that the function / : N —>• N given by 


f(n) 


1 if n codes a program 
0 otherwise 


is recursive. So, take an enumeration for E (the domain of some partial recursive 
function g). Begin a diagonal process which starts computing g(n) for each 
n € N. For each n on which g(n) halts, test if n is a code using /. If so, output 
0; if not, enter some non-terminating loop. As this is a complete description of 
an algorithm which computes <fci we can apply Church’s thesis to conclude that 
4>C is partial recursive, and thus that C is r.e. Moreover, a further application 
of Church’s thesis allows us to compute an index m for a register machine P m 
which computes <fc ■ That is, <f>c = fm, i, and so C = W m . □ 


Theorem 1.33 (Unions and intersections of r.e. sets). 

1. Let I C N be a (possibly infinite) r.e. set of integers, and I' C I those which 
are codes for register machines. Then the union of r.e. sets 

U W n 
ner 

is again r.e. 

2. Let J C N be a finite set of integers, and J' C J those which are codes for 
register machines. Then the intersection of r.e. sets 

n w n 

n€J' 

is again r.e. This does not extend to an infinite intersection, even if the index 
set recursive. 

In both case 1 and 2, we can construct an index for the union/intersection of 
these r.e. sets. 
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Proof. 

1. For each x € N, begin a diagonal process which starts computing f n .\ (x) for 
each n € I ', and look to see if any of these halt. This process will terminate iff 
x lies in at least one W n . As this is a description of an algorithm to enumerate 
Une/' Wru we have that it is thus r.e. by Church’s thesis, and moreover this 
explicit description allows us to construct an index for the r.e. set Une/' Wn- 

2. For each x € N, begin a diagonal process which starts computing f n ,i{x) 

for each n € J 7 , and look to see if all of these halt. This process will terminate 
iff x lies in all the W n (n € J 7 ). As this is a description of an algorithm 
to enumerate fine/' Wu we thus have that it is r.e. by Church’s thesis, and 
moreover this explicit description allows us to construct an index for the r.e. set 
OneJ'Wn- □ 

1.8. Universality and undecidability. 

We saw, in Lemma 1.23, a function g that was constructed to ‘contradict’ 
every partial computable function on 1 variable at least once. We will now invert 
this idea, to construct a function u which simulates every partial computable 
function, simultaneously. 

Theorem 1.34 (Universal partial recursive function). 

There exists a partial recursive function u : N 3 —>• N such that 

r if: n codes a program 

, , , _ and m codes a fc-tuple of integers ((m)i,..., (m)k) 

u{n, , m) an d f n ,k({ m ) l, • • •, ( m)k ) is defined and equals r. 

t otherwise 
1 

Such a partial recursive function is said to he universal, as it is capable of 
simulating any program for a register machine, and thus simulating any partial 
recursive function. 

Proof. We describe an algorithm to compute u: by Church’s thesis this will 
suffice. On input (n, k, m ), we first check if n is a code; if not, enter some non¬ 
terminating loop. If so, decode m as a /c-tuple ((m)i,..., (m)*,) if possible (i.e., 
if to > 0 and the largest prime dividing to is at most p^)', if this is unsuccessful, 
enter a non-terminating loop. Otherwise, simulate the computation of pro¬ 
gram P n with input registers ((m)i, ..., (m) k , 0, 0, .. .). If P n ((m) i,..., (m) fc ) 
eventually halts, then we output the contents of register R\. □ 


The implications of the existence of a universal partial recursive function 
(equivalently, a universal computing device) are what underpin the last 80 years 
of human technological advancements. To make this clear: a universal partial 
recursive function is often referred to as a programmable computer. With this, 
it is possible to construct one physical computation device, and then on it have 
the ability to simulate all possible computer programs, without any need to 
modify the hardware 11 . 

We now show that not every recursively enumerable set is recursive. 


Theorem 1.35 (Undecidability and the halting set). 
The set 


K := {n € N | f n>1 (n) |} 


11 If you are still unconvinced of the importance of universal computation, imagine a world 
where every update of your favourite operating system (Windows, macOS, Ubuntu, Android, 
etc.) ships with a screwdriver. 
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is known as the halting set. It is recursively enumerable, but not recursive. 

Proof. Given any n € N, we start computing / n ,i(n); this halts iff n € K. As 
this is a description of a partial algorithm, then by Church’s thesis we see that 
IK is r.e. 

Now, suppose that N \ K were r.e.; we proceed by contradiction. For if it 
were, then there would exist some N such that Wn = N \ K (that is, N \ K is 
the domain of So now we ask whether N is in IK or its complement. But 

we see that 

NeK^f Ntl (N)i 
N £ Wn 
e N\K 
^ N $ IK 

which is a contradiction. □ 

What this is telling us is the following: Given a partial recursive function 
fn t i, and an input t, we cannot compute in advance whether or not f n ,i{t) is 
defined (that is, whether or not the computation halts). If it is defined, 

we can verify this computationally. But if it is not defined, then there are cases 
when we can never ‘be sure’ that this is indeed the case. So even though f n i 
completely defines W n (existentially), we can’t always compute membership in 
W n algorithmically. 

It is important to note that universality implies undecidability. That is, just 
using the statement ‘there is a universal register machine’, it is possible to 
prove the statement ‘there is an r.e. set which is not recursive’. One might 
lament the fact that we encounter undecidability in the study of computation, 
but it is this very fact which allows us to build universal (also known as pro¬ 
grammable) computers; if we did not have undecidability, then we could not 
have universality 12 . 

1.9. Reductions. 

We now look at another way to show that certain sets are not recursive (or 
even r.e.), and that is via reductions. Intuitively, we look for ways to conclude, 
in a computational manner, membership in one set X from membership in 
another set Y. Thus, if we know that we can’t decide membership in X, then 
it means we can’t decide membership in Y. 

Definition 1.36 (Many-one reductions). 

Given two sets A, B C N, a many-one reduction of A to B is a total recursive 
function / : N —> N such that, for all n € N, we have 

f(n) € B 

If there is a many-one reduction of A to B, then we say that A many-one reduces 
to B, or A is many-one reducible to B, and we write this as A < m B. 

So in order to compute membership of n in A, we evaluate the function 
f(n) and then ‘ask B one question: Is f(n) in B or not?’ If so, n € A, if 
not, n A ; in either case, we cannot do any computation after this question. 
The name ‘many-one reduction’ comes from the fact that membership in A of 
several elements n\ , ,... can reduce to testing if one element lies in B (that 

12 Undecidability is not only interesting, but also inherently useful. 
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is, we might have f(n\) = f(n 2 ) = ...). This differs from the notion of Turing 
reductions (which is beyond the scope of this course), where we are allowed to 
ask B whether several elements lie inside or outside it, and we can carry out 
computational steps in between these questions. 

Lemma 1.37. A < m B N \ A < m N \ B. 

Proof. This follows immediately from the definition of many-one reductions. □ 
Many-one reductions help us identify sets which are/aren’t recursive or r.e. 

Lemma 1.38. 

a) If A < m B, and B is r.e., then so is A. 

b) If A < m B, and B is recursive, then so is A. 

Proof. 

a) Let A < m B via the total recursive function /. As B is r.e., it is the 
domain of some partial recursive function g (that is, x G B g(x) |). So A is 
thus the domain of g o /, which is partial recursive. Hence A is also r.e. 

b) Let A < m B via the total recursive function /. Then the same map / is a 
many-one reduction of N \ A to N \ B (as x ^ A f(x) ^ B). Thus, by a), A 
and N \ A are both r.e. (as B and N \ B are both r.e.). So A is recursive. □ 

We now introduce a useful idea, which shows that holding some of the vari¬ 
ables of a partial recursive function fixed gives us another partial recursive 
function (which we can construct a register machine for). This is often re¬ 
ferred to as currying , named after Haskell Curry. The theorem itself is called 
the ‘s-m-n theorem’, named after the notation used in the original proof by 
Kleene 13 . 

Theorem 1.39 (The s-m-n theorem). 

For all m,n > 0, a partial function h : N m+n —>• N is partial recursive if 
and only if there is a total recursive function g : N m —>• N such that, for all 
(e 1 ,...,e m ,x 1 ,...,x n )£N m+n , we have that 

h{e 1, . . . , e m , X \, . . . , Xyf) ,/g(ei,...,e m ),n (^T 1 * • • ? X n ) 

Here ‘= ’ is interpreted to include ‘one side is defined iff the other side is’. 

Proof. Suppose h satisfies the hypotheses of the theorem; we show that it is 
partial recursive. Given input (ei,... ,e m ,x 1 ,... ,x n ), we first compute the to¬ 
tal recursive function ^(ei,...,e m ) = M, and then start the computation of 
fM,n(xi,...,x n ) via the register machine with program Pm (if M is not a 
code then we simply say that h is undefined for this input). If the compu¬ 
tation of fM.n( x 11 • • • ,x n ) ever halts, then we take the output as the value of 
h(e 1 ,..., e m , xi ,..., x n ). Given that we have completely described an algo¬ 
rithm to partially compute h, then by Church’s thesis we have that h is partial 
recursive. 

Now, suppose that h is partial recursive. For each (ei,...,e m ) € N m , we 
describe a function fc( ei ,...,e m ) : N” —>• N as follows: given input (x\, . .., x n ), 
start the computation of h(e 1 ,... ,e m ,xi,... ,x n ), and if this halts, take the 
output as kr eit ...,e m )(a:i, • • •, x n ). Thus we have a complete description of an 
algorithm which partially computes k( ei ,...,e m )> thus by Church’s thesis we can 


33 And not for any deeper or more insightful reason. 
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construct, from (ei,..., e m ), a code (call it g(e\, ..., e m )) for a register machine 
Pg( ei,...,e m ) which partially computes the function k V ei ,...,e m )- That is, 

fyei,...,e m ) = fg(e u ...,em),n : N n ^ N 

But this is a total algorithm which describes how to construct the (total) func¬ 
tion g : N m —>• N, and so by another application of Church’s thesis we see that 
g is total recursive. As h(e i,..., e m , xi ,..., x n ) = f g ( ei ,...,e m ),n( x i> by 

definition, we have that h satisfies the required conditions. □ 

We use this to show that the halting set K is the strongest r.e. set under 
many-one reductions, in the following sense. 

Theorem 1.40. A set X C N is r.e. if and only if X < m K. 

Proof. From Lemma 1.38, we see that if X < m K then X must be r.e. (as K 
is). Now, suppose that X is r.e. Define the partial function / : N 2 —>• N via 

,, , f 1 if e G X 

otherwise 

Then / is partial recursive; given an input (e, n) we begin computing (fx(e), 
and this will halt iff e G X. When it does, output 1 for /(e, n). Thus, by 
Church’s thesis, / is partial recursive. So by Theorem 1.39, there is a total 
recursive function g : N —>• N with f(e,n) = f g ( e ), i(ra) for all (e,n) € N 2 . So 
now we see that 


f(e,g(e)) | 

^ /< ? (e),i(5 f (e)) I 
44 g{e) G K 

So e G X 44 g(e) G K, where g is a total recursive function. Thus X < m K. □ 

The idea here is that, with absolute knowledge of K, we have absolute knowl¬ 
edge of each r.e. set, in a computable way. 

We can also apply the s-m-n theorem to prove that every computable function 
has a ‘fixed point’. This is known as the recursion theorem. 

Theorem 1.41 (The recursion theorem). 

For each total recursive function h : N —>• N, there is some n G N with f n< i = 
fh ( n ).i as functions. 

Proof. Consider the function g : N 2 —>• N given by 

g{x,y) :=u( h{u(x,1,3 X )), 1, 3 V ) 

where u is the universal partial recursive function from Theorem 1.34. By the 
s-m-n theorem, we can ‘curry’ this, and find a total recursive function on one 
variable (say f m , l) for which 

9(x,y) = // m>1 (x),i(y) Vx, y G N 
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Let n = / TO; i(m); this will be our fixed point. Then for all y with f n ,i(y) I we 
have 


fn,l(y) ff m ,i(rn),l{y) 

= g{m, y) 

= u( h(u(m, l,3 m )), 1, 3 y ) 

= u( h(fm t i(m))), 1, 3 y ) by definition of u 
= u( h(n ), 1, 3 y ) 

fh(n'). 1 iy ) 

The above reasoning also shows that f n ,i(y ) t=^ fh(n),\ (y) t- 

Thus fn.i = fh(n ),i as functions. □ 

1.10. Rice’s theorem. 

One would, of course, like to compute things about r.e. sets. It would be 
useful, for example, to be able to determine (in a computable way) whether or 
not W n is all of N, as this would tell us precisely when f n ^ is total. We will 
soon see that this is not possible; moreover, there is no (non-trivial) property 
of r.e. sets that we can compute! 

Definition 1.42. A property of r.e. sets is a map 

p : {A C N | A is r.e.} -A {0,1} 
where 0,1 represent ‘false’ and ‘true’ respectively. 


For example, the property of ‘being empty’ is represented by the map 


p(X) : = 


1 if A = 0 
0 if A / 0 


In order to compute whether an r.e. set has a particular property or not, we 
need a finite way to describe this r.e. set. We can take a code n for the register 
machine P n which describes the characteristic function of the r.e. set, but note 
that it is the set which does or doesn’t have the property, independent of which 
register machine we pick to describe it (and there may be many). So really, we 
are computing p(W n ) (and actually, we can view this as computing p(n)). So 
which properties can we compute? 


Example 1.43. The property ‘being non-empty’ is r.e. but not recursive. That 
is, the set I = {n G N | n codes a program and W n / 0} is r.e., but not 
recursive. 


Proof. Take n and compute if it is a code for a program. If so, start a diagonal 
process to begin computing f Uj i(l), / n ,i(2),.... One of these will terminate iff 
W n is non-empty, and so this index set is r.e. by Church’s thesis. 

On the other hand, take an integer n, and define a partial function g via 

, . f 1 if n € K 

g{ n i x ) ■ { | otherwise 


So by Church’s thesis g is partial recursive, and by Theorem 1.39 there is a 
recursive function h : N —)• N such that g(n,x) = fh( n ),i( x ) V(n, x) € N 2 . If 
n € K then W/j( n ) = N / 0. If n ^ IK then W^n) = 0. Thus n 6 N\K o 
W h t n -) = 0 li{n) G N \ /, and so we have a many-one reduction from a 
non-r.e. set N \ K to the set N \ /, and so the latter is not r.e. Hence I is not 
recursive. □ 
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Definition 1.44. A property of r.e. sets p is said to be nontrivial if there exist 
two r.e. sets A,B such that p(A) = 0 and p(B ) = 1. That is, not all sets have 
(or do not have) the property described. 

It turns out that the only properties of r.e. sets we can algorithmically recog¬ 
nise are the trivial ones 14 . 


Theorem 1.45 (Rice’s theorem). 

Let C be a non-trivial class of r.e. sets, and I the set of indices which code 
programs and give r.e. sets in C. That is, 

I = {n G N | n encodes a register machine and W n G C} 

7/0 i C then K < m I; i/0 G C then IK < m (N\ I). 


Proof. There are two cases to consider here. 

Case 1: 0 ^ C. In this case, fix any r.e. set 0 A G C. Now define the following 
partial recursive function g : N 2 —> N by 

, , (l if n € IK and x € A 

= otherwise 


This is a description of how to compute if g halts on a given input, and so 
by Church’s thesis g is partial recursive. So by Theorem 1.39 there is a total 
recursive function h : N —>• N such that g{n , x) = fh( n ),i( x ) V( n , x) € N 2 . Notice 
that n G IK =>- W ft ( n ) = A => W h ^ n ) G C, and n ^ K => W h ^ = 0 => W ft ( n ) ^ C. 
Thus n G IK h(n) G 7, and so we have a many-one reduction IK < m I. 

Case 2: 0 G C (analogous to the first case). In this case, fix any r.e. set 
0 7 ^ A C. Now define the following partial recursive function g : N 2 —> N by 

. . f 1 if n G IK and iGi 

= otherwise 


This is a description of how to compute if g halts on a given input, and so 
by Church’s thesis g is partial recursive. So by Theorem 1.39 there is a total 
recursive function h : N —>• N such that g(n , x) = fh(n),ii x ) x ) G N 2 . Notice 
that n G IK => W ft ( n ) = A => W h ^ C, and n ^ K=> W h ( n ) = 0 =>- W ft ( n ) G C. 
Thus n G IK h{n) G N \ /, and so we have a many-one reduction IK < m 
N \ 7. □ 


Corollary 1.46. Every non-trivial property of r.e. sets is undecidable (i.e., 
nonrecursive). That is, if p is a non-trivial property of r.e. sets, then the set 

{n G N | n encodes a register machine and p(W n ) = 1} 

is not recursive. 


Thus, if you are given an r.e. set W n and asked some non-trivial question 
about it (i.e., Is it finite? Is it empty? Does it contain more than 55 elements? 
Does it contain 9 but not 6 ? Is it recursive? Is it co-finite? Are all its elements 
even?), then you have no way of answering in an algorithmic manner. You may 
be able to answer the question for some particular cases of n, but not for all 
cases. 


14 Hopefully by now this does not come as a surprise to you. 
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2. REGULAR LANGUAGES AND FINITE-STATE AUTOMATA 

We have seen the most general types of computing machines. Now we turn 
our attention to some more restrictive machines, which are less powerful but 
easier to work with. 

2.1. Deterministic finite-state automata. 

Definition 2.1 (Languages). 

Let X = {x \,... ,x n } be a finite set. Then we define X* to be the set of all 
finite strings of elements of X (including the empty string e). We often refer to 
elements of X* as words. A language over X is any subset of X*. 

Sometimes we can describe languages in a nice way. For example, take X = 
{0,1}. Then the following are all languages over X: 

(1) All words that start with 0. 

(2) All words that contain the same number of 0’s and l’s. 

(3) All words which, for some fixed n, are the binary expansion for an 
integer which lies in W n . 

As we can see, some languages (like (3) above) will thus be undecidable! We 
can also take a different finite set, such as A = {a, b,c,.., ,x,y, z}, and take our 
language to be all strings which give English words. We now give another way 
to describe some languages. 

Definition 2.2 (Deterministic finite-state automata). 

A deterministic finite-state automaton (DFA) is a structure D = ( Q , E, 5 , qo, F) 
consisting of the following: 

(1) A finite set of states Q. 

(2) A finite input alphabet E. 

(3) A transition function 6 : Q x E —>• Q which is total. 

(4) A designated start state qo € Q. 

(5) A finite set of accept states F C Q. 

The input of a DFA is any finite string w = ay ... € S*. The DFA takes 

w, reads the first symbol oy whilst ‘in’ the start state qo, and then evaluates 
the transition function <5(go>c r i) = Pi to ‘move to’ a new state. The DFA then 
reads the next symbol ay of w, and evaluates <5(pi,02), and moves to the next 
state. This continues for the entire string w. 

If at the end of this process the DFA is in one of the accept states F, then 
we say that w is accepted by A. Otherwise, we say that w is rejected. 

The above description is not very intuitive. There are more convenient ways 
to describe DFA’s, such as transition diagrams and transition tables. 

Definition 2.3 (Transition diagrams). 

A transition diagram for a DFA D = ( Q , E, 6 , qo, F) is a directed graph T£> with 
the following properties: 

(1) The vertex set of To is precisely the set of states Q, labelled as such. 

(2) For each ( q , a) £ Q x E, we add a directed edge from q to S(q, a), and 
label this with a. 

(3) We add one additional directed edge from ‘nowhere’ to the vertex qo, 
and label this ‘start’. 

(4) For clarity, we draw each vertex as the state q with a circle drawn around 
it; if q € F then we instead draw two circles around q. 
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Notice that every vertex will have precisely |E| edges leading out of it; one 
for each a € E. Hence the name deterministic: our next move is completely 
determined. Now, to process an input w = 07 ... 07 , € E*, we place a small 
movable marker at qo, then read the first symbol a± of w and move the marker 
along the edge out of qo labelled by o\ and to the adjacent vertex 5(qo,ai). 
Then read the next symbol 02 of w and repeat the process. Upon reaching 
the end of the string w, note on which state the marker has reached. This 
completely describes the given DFA. 

Transition diagrams are very intuitive, and give a clear ‘picture’ of what that 
DFA does. However, they take up a lot of space on a page. There is a more 
compact way to describe a DFA. 

Definition 2.4 (Transition tables). 

A transition table for a DFA D = ( Q , E, 6, qo , F) is a table To with: 

(1) Labels down the left of the table; one for each state in Q. 

(2) Labels across the top of the table; one for each symbol in E. 

(3) Entries in the middle of the table; position (q, a) is given value 8(q,a). 

(4) For clarity, we place a star *q next to the states down the left of the 
table which correspond to accepting states, and we place an arrow —>• qo 
next to the state on the left of the table which is the start state. 

As you can see, from this table we can read off the transition function <5, as 
well as the states Q, the accept states F, the start state qo. and the alphabet 
E. This completely describes the given DFA. 

2.2. Regular languages. 

Given a DFA D = ( Q , E, 5 , qo, F), we often want to know ‘where does w end 
up when input into DT We can define a function to do this. 

Definition 2.5 (Extended transition function). 

Let D = (Q,Tj, 6, qo, F) be a DFA. We define the extended transition function 
of D, 5 : Q x E* —>• Q, inductively via: 

8(q, e) := q for q € Q 

8(q , a) := 8(q, a) for q G Q, a G E 

8(q, 01 ... a k ) := S(S(q, or ... a k ~i), a k ) for q € Q, 01 ,..., a k G E 

In particular, for any w £ E*, 8(qo, w) tells us the state that we end up at when 
we input w into D. 

Lemma 2.6. Let D = (Q, E, 8, qo, F) be a DFA. Then, for all 1 < l < k, all 
o\... a k € E*, and all q € Q, we have that 

8{q,ai ... cr k ) = <?(%,oi. • . 07 ), 07+1 • • -o-fc) 

Proof. We proceed by induction. Clearly the statement is true for k = 1; assume 
it is true for all k < m. Then we have 

8{q, or ... a m ) = 8(8(q, o\... a m - 1 ), cr m ) (definition of (5) 

= 8(8(8(q,ai .. . 07 ), 07 + 1 . (induction) 

= 8(8(q, or ... 0 /), 07 +!... <J m ) (definition of 8) 

□ 
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Definition 2.7 (Regular languages). 

Let D = (Q, £, 6, qo, F) be a DFA. We define the language of D, C(D ), to be 
the words which are accepted by D. That is, 

C(D) : = {u> G £* | S(q 0 ,w ) € F} 

This is the language of words over £ which are taken to an accepting state in 
D. We say a language L is regular if L = C{D) for some DFA D. 

2.3. Nondeterminism. 

We now define a (seemingly) more general form of finite state automaton: 
one which can ‘explore’ several possibilities simultaneously. The only difference 
between these and the DFA’s that we saw before is that these new automata 
have a non-deterministic transition function 5. We adopt the notation V(X) 
for the power set of a set X. 

Definition 2.8 (Nondeterministic finite-state automata). 

A nondeterministic finite-state automaton (NFA) is a structure N = ( Q, £, 5, qo, F ) 
consisting of the following: 

(1) A finite set of states Q. 

(2) A finite input alphabet £. 

(3) A transition function 6 : Q x £ —»• V(Q) which is total. 

(4) A designated start state qo € Q. 

(5) A finite set of accept states F C Q. 

The input of an NFA is any finite string w = o\ ... Ok £ £*. The NFA takes 
w, reads the first symbol or whilst ‘in’ the start state qo, and then evaluates 
the transition function 5(qo, or) = {p\,... ,Pm} to simultaneously ‘move to’ all 
the new states. The NFA then reads the next symbol 02 of w, and evalu¬ 
ates S(p,a 2 ) for all p G S(qo,oi), and simultaneously moves to all these states 
U p e<h<? 0 <ri) a2 )- This continues for the entire word w. 

If at the end of this process the NFA is in a configuration that contains at least 
one of the accept states F, then we say that w is accepted by N. Otherwise, 
we say that w is rejected. 

Be aware that the transition function 5 might give the empty set on certain 
inputs. That is, we might have 5(q, a) = 0 for some o G £. This is fine. 

The utility of having a nondeterministic transition function is that we can 
‘explore’ many possibilities for an input word at once. 

Definition 2.9 (Transition diagrams and transition tables for NFA’s). 

We define the transition diagram and transition table T/v for the NFA N in 
essentially the same way that we define them for a DFA. The only differences 
are: 

(1) For the transition diagram of an NFA, we might have several directed 
edges out of the same state with the same label, or even none at all. 

(2) For the transition table of an NFA, the entries in the interior of our 
table will be sets of states (including possibly the empty set). 

NFA’s are best understood via their transition diagrams. To see how an NFA 
N processes an input w = a\ ... € £*, we place a small marker at qo, then 

read the first symbol or of w. Now we take more markers and move them along 
all the edges out of qo labelled by o\ and to the adjacent vertex set 6(qo,a\), 
and then we remove the original marker. Then read the next symbol <72 of w 
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and repeat the process (It helps to have two colours of markers: red and blue. 
For each iteration, we take the new markers to be the opposite colour to the 
old markers, and then remove the old markers). Upon reaching the end of the 
word w, note on which states the markers are on; if any of these are accept 
states, the N accepts w, otherwise it is rejected. This completely describes the 
given NFA. 

Be aware that, in the above description, we might reach state q , read symbol 
a, and have that there is no edge out of q labelled a (that is, 6j\r(q,cr) = 0). 
This is fine. In this case, the marker in question merely ‘drops off’. 

Definition 2.10 (Extended transition function for NFA’s). 

Let N = (Q , E, 6. go, F) be an NFA. We define the extended transition function 
of N, 5 : Q x E* —>• V(Q), inductively via: 

%,e) := {g} for g G Q 

S(q , a) := S(q, a) for g G Q, a G E 

S(q,ai.. .a k ) := [J 5(p, a k) 

peS(q,cr 1 ...rT k - 1 ) 

In particular, for any w G E*, S(qo,w) gives us the set of states we end up at 
when we input w into N. 

With this, we can say what the language of an NFA is. 

Definition 2.11 (Language of an NFA). 

Let N = (Q, E, S. go, F) be an NFA. We define the language of N, C(N), to be 
the words which are accepted by N. That is, 

C(N) := {re G S* | <5(g 0 , w) D F + 0} 

This is the language of words over E for which 5(qo,w) contains at least one 
accepting state of N. 

2.4. Equivalence of DFA’s and NFA’s. 

It is straightforward to show that any regular language is the language ac¬ 
cepted by some NFA. What is less obvious is that the converse is true: every 
language accepted by an NFA is also accepted by some DFA. To do this, we 
employ what is known as the subset construction which takes an NFA N and 
produces a DFA D on the same alphabet such that C(N) = F(D). 

Definition 2.12 (The subset construction). 

Let N = (Qn,Yi, Sn, go, Fn) be an NFA. We define the following construction 
of a DFA D = (Qd, £, Sd, {< 7 o}, Fjj), called the subset construction , as follows: 

(1) Qd '■= V(Qn)', the power set of Qn- 

(2) Fp := {5 C Q n | S n F / 0}; the set of subsets of Qn which intersect 
the accepting states Fn of N. 

(3) For each S C Q N and each a G E, we define 

5 D (S,a) := |J S N (p, a ) 

peS 

which is the set of states reached from the states in S by going along 
an edge labelled a in Fpr. 
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Observe that the start state of D constructed above is {( 70 }; that is, the set 
containing q 0 . This is because the set of states is V(Qn) 

It is usually easiest to give the subset construction as a transition table, as it 
often becomes complicated when trying to draw the transition diagram. 

Theorem 2.13 (Extended transition function in the subset construction). 

Let D = (Qd, S, $d, {go}, Fd) be the DFA constructed from the NFA N = 
(Qn, £, $n, qo, Fn) via the subset construction (Definition 2.12). Then, for any 
w € E*, we have 

6d{Uo},w) = S N (q 0 ,w) 

Proof. We proceed by induction on |u>|. Clearly, if |re| = 0 (that is, w = e), 
then 6 D ({q 0 },e) = {g 0 }, and so S N (qo,e) = {g 0 }- 
Now suppose that <}d({<Zo},^) = <5 at (go, v) for all v with |u| < n. Let w be 
a word of length n + 1. Then w = xa, where a G E is the final symbol of w. 
By induction, as x = n, we have that <5 d({<Zo}> x) = 5]y(qo,x). Call this set 
[p \,... , pk} C Q n . By dehnition of 5 for NFA’s, we have 

k 

5 N (q 0 ,xcr)= 5 N (q,a) = [J 5 N (pi,a) 

g&S N (qo,x) 1=1 

Now, the subset construction gives that 

k 

$d({pi , • • -,Pk},<r) = 1J S N (Pi,cr) 

i— 1 

So we have that 

^d({9o},^) = S D ({qo},xa) 

= d D (d D ({q 0 },x),a) 

= Sd({pi, ■ ■ ■ ,Pk},cr ) 

k 

= 1J b N (pi, a) 

1=1 

Thus we have that Soiiqo}, w) = <5jv(fjo, w), and the induction is complete. □ 

We can now use this to say something about the accepted languages of these 
automata. 

Theorem 2.14 (Equivalence of language in the subset construction). 

Let D = (Qd, £, $d, {<7 o}, Fd) be the DFA constructed from the NFA N = 
(QnjTj, 6n, qo, Fn) via the subset construction. Then C(D) = C(N). 

Proof. From the previous theorem, we see that, for any w £ E*, 

D accepts w 44 So({qo}, vj) contains a state in Fn 
44 5N(qo,w) contains a state in Fn 
44 N accepts w 

Thus C(D) = C(N), as both D and N have the same alphabet E. □ 

Theorem 2.15. A language L is accepted by some DFA if and only if it is 
accepted by some NFA. 
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Proof. We showed in Theorem 2.14 that, from any NFA, we can construct a 
DFA on the same alphabet which accepts the same language. 

So now suppose we have that L is accepted by some DFA D = (Q,T,,5d,Qo, F). 
Take a transition diagram for D. Then this also defines a transition diagram 
for an NFA N with the same states, same alphabet, and same accepting states. 
We need only remark that the transition function 5n for N will be given by 

$N(q,cr) ■= {M<?w)} 

That is, if Sp(q. a) = p, then bjqiq,^) = {p}- We prove by induction on |w| 
that, if d D (q 0 ,w ) = p. then S N (q 0 ,w ) = { p }: 

Basis: Let |u;| = 1, so w = a G E. We have defined 6]\r(qo , <r) = {<5 d((/0) <j)}, so 
we’re done. 

Induction: Suppose, for all v with |u| < n, we have <5jv(<Zo,^) = {£d(<7o> ^)}- 
Take w with \w\ = n + 1, then w = xa for some x € E*, cr € E. Now we have 
that: 

S N (q 0 ,xa) = [J S N (p, a) 

P&S N (q 0 ,x) 

= 1J $ n ( p , ct ) 

p£{$D{qo,x)} 

= S N (6 D (q 0 ,x),a) 

= {5 D (6 D (q 0 ,x),cr)} 

= {SD(qo,xa)} 

Thus D and N = (■ Q , S, 5n, qo, F) accept the same words, so C(D) = C(N). □ 

So we see that, whenever we want to show that a language L is regular, it 
suffices to produce a DFA or an NFA which accepts L. 

2.5. e-transitions on NFA’s. 

An NFA allows us to ‘explore’ many paths through a transition diagram 
simultaneously. However, we are constrained to not change the states that 
we are ‘in’ until we read the next letter of the input word. By modifying 
things slightly and introducing e-transitions, we give ourselves an extra degree 
of flexibility. 

Definition 2.16 (e-NFA). 

An e-NFA is very similar to an NFA, in that it consists of: 

(1) A finite set of states Q. 

(2) A finite input alphabet E. 

(3) A transition function 6 : Q x (E U {e}) -A V(Q) which is total (that is, 
we now have transitions on the empty word e). 

(4) A designated start state qo € Q. 

(5) A finite set of accept states F C Q. 

We write this as E = (Q, E, <5, qo, F). 

Definition 2.17 (Transition diagrams and transition tables for e-NFA’s). 

We define the transition diagram Te and transition table Te for the e-NFA E in 
essentially the same way that we define them for an NFA. The only differences 


are: 
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(1) For the transition diagram of an e-NFA, we might have directed edges 
out of a state labelled with e. 

(2) For the transition table of an e-NFA, e is now one of the symbols at the 
top of the table. 

So basically, a e-NFA looks just like an NFA, but with e acting as an extra 
‘symbol’. It still takes as input words w € £*, but processes them in a slightly 
different way to an NFA. To describe this, we first need the notion of e-closure. 

Definition 2.18 (e-closure). 

Let E = (Q, S, 5, qo, F ) be an e-NFA, and q G Q. We define the e-closure of q, 
eclose(g), to be the set of all states that can be reached from q by sequences of 
transitions of the form 6e(p, e) (such transitions are called e -transitions). That 
is, we inductively define sets of states Sjfq) by 

So (?) := M 
Si(q) ■= M U S(q,e) 

Si+i(q) ■= Si{q ) U ( |J 5(r, e)) 

reSi(q) 

When this series stabilises (that is, when we find an n with S n +i(q) = S n (q ); 
and it will stabilise as there are only finitely many states in E), then we set 

OO 

eclos e(q) := [J Si(q) = S n {q) 

i 0 

If S C Q is an arbitrary set of states, then we define 

eclose(S') := |^J eclose(s) 
s£S 

We say that R C Q is e-closed if eclos e(R) = R. 

Lemma 2.19 (e-closure is a closure property). 

Let S C Q. Then eclose(S') is e-closed. 

Proof. We need to show that eclose(eclose(5)) = eclose(5) for any S C Q. So 
take any r € eclose(eclose(S')). Then r € eclose(t) for some t € eclose(S’), and 
moreover t € eclose(.s) for some s € S. So: 

1. We can reach t from s by following a sequence of e-transitions (that is, 
transitions of the form 5e(p,c)). 

2. We can reach r from t by following a sequence of e-transitions. 

Thus we can reach r from s by e-transitions, and thus r € eclose(.s) C 
eclose(5). Our choice of r was arbitrary, so eclose(eclose(5)) C eclose(5). 

Finally, it is clear that eclose(S) C eclose(eclose(5)), as s € eclose(s) for any 
state s. 

Thus we have that eclose(eclose(5)) = eclose(S). □ 

The idea of including e as a symbol to be processed by the transition function 
is that we can ‘explore’ all transitions labelled by e without reading the next 
symbol in the input word. We now define the extended transition function of 
an e-NFA. 
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Definition 2.20 (Extended transition fuction for e-NFA). 

Let E = ( Q , E, 8, qo, F ) be an e-NFA. We define the extended transition function 
of E, 5 : Q x E* —>• V(Q), inductively. Firstly, for each q £ Q, we set 

8(q, e) := eclose(g) 

Now, suppose w = xcr G E* for some a G E. Then we do the following: 

(1) Let [pi ... ,p k } be S(q,x). 

(2) Let {n L ...,r m } = \Jl =1 8{jpi,a). 

(3) Define 8(q,xa) := [j'JLi eclose(rj). 

In particular, the inductive definition looks like this: 

5{q,xa) : = U eclose(r) 

reUpe^,*) s< - p ’ <t) 

So for any w € E*, 8(qo,w) gives us the set of states we end up at when we 
input w into E. We say that E accepts w if 8(qo,w ) contains at least one state 
from F. 

The above definition is somewhat confusing, so here is a description of what 
happens when we input a word w into an e-NFA E = ( Q , E, 5 , qo, F) (and it is 
best to picture a transition diagram T e for E when doing this): 

E first computes eclose((/o) and places a red marker on all the states in 
eclose(go); that is, places a red marker on all states which can be reached 
from qo by following edges labelled e. Then E takes w, reads the first symbol 
a i whilst ‘in’ the set of states eclose(go)> and then evaluates the transition func¬ 
tions to get a set of states U<;eeciose(g 0 ) 'Ll); this is done on the transition 
diagram by following all edges labelled o\ out of red-marked states and plac¬ 
ing a blue marker on all the new states. E then computes the e-closure of all 
these states, and simultaneously ‘moves to’ this closure; so we place extra blue 
markers on all states which can be reached from the current blue-marked states 
by following edges labelled e. Now remove all red markers. E then reads the 
next symbol a 2 of w, evaluates 5(q, 02 ) for all q in the states that it is currently 
in, and then takes the e-closure of these new states (this is the same process as 
before, interchanging the roles of red and blue markers). Once E has read all 
the symbols in w, we are left with a transition diagram T e with several markers 
(all of the same colour) on states; if any of these markers lie on a state in F, 
then E accepts w. 

Definition 2.21 (Language of an e-NFA). 

Let E = (Q, E, 8, qo, F) be an e-NFA. We define the language of E, C(E), to 
be the words which are accepted by E. That is, 

C(E) := {weS* | S(q Q , w) D F + 0} 

This is the language of words over E for which 5(qo,w) contains at least one 
accepting state of E. 

If we had a transition diagram for an e-NFA E containing no e’s (that is, 
5E(q, e) = 0 for all q G Q ), then what we have is an NFA. Thus, all definitions 
and results on e-NFA’s from Section 2.5 carry to NFA’s. Thus Section 2.3 is 
somewhat redundant; we included it only to develop the intuition in a more 
natural way. 
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It would seem as though e-NFA’s are much more general than NFA’s and 
DFA’s. However, we will again see that we do not get any new languages by 
considering e-NFA’s over NFA’s or DFA’s. 

2.6. Equivalence of DFA’s and e-NFA’s. 

We show that regular languages are precisely those accepted by an e-NFA. To 
do this, we describe a process of converting an e-NFA into a DFA, very similar 
to the subset construction. 

Definition 2.22 (The subset construction with e-transitions). 

Let E = {Qei^^EiQOiFe) be an e-NFA. We define the following construc¬ 
tion of a DFA D = (Qx),S, <5 d, qo, Fe), called the subset construction with 
e-transitions, to have: 

(1) States Qd '■= V(Qe ); the power set of Q E - 

(2) Start state qo '■= eclose(go); the set of states in the e-closure of go. 

(3) Accept states Fo := {5 C Q E \ S fi F / 0}; the set of subsets of Qe 
which intersect the accepting states Fe of E. 

(4) Transition function: for each S C Q E and each a € E, we define 
8o(S, a) as follows: 

a) Let S = {pi,...,Pk} 

b) Let Uti Se(Pu a) = {n, • • •, r m } 

c) Define 5o(S,o) : = U"L 1 eclose(rj) 

That is, 

Sd(S, <j) := eclose ( (J S E {p, cr)) 
peS 

which is the set of states reached from the states in S by going along 
an edge labelled a followed by some number of edges labelled e. 

Observe that the start state of D constructed above is the e-closure of qo, 
rather than the set containing just go- Also observe that, if S is e-closed, then 
so is 5d(S,c t) (as a subset of Q e ) for every a € E. Thus, given that our start 
state of D, eclose(go), is e-closed, then we can only ever reach other e-closed 
sets via the transition function 5d- 

Definition 2.23 (Accessible states of an automaton). 

Let A be any finite-state automaton (DFA, NFA, e-NFA). The accessible states 
of A are those which are reachable from the start state by a finite number 
of applications of the transition function 5a', that is, states g for which g = 
5a(qo, w ) (or g e 4 a( go, w)', whichever is relevant) for some word w. The rest 
of the states are said to be inaccessible. 

Thus we see that the accessible states of D in Definition 2.22 must be e-closed, 
as the start state is. 

Theorem 2.24 (Extended transition function in the subset construction with 
e-transitions). 

Let D = (Qd,T,, 5d, Qd, Fjj) be the DFA constructed from the e-NFA E = 
(Qe,Ti, 5 e, qo, F e ) via the subset construction with e-transitions (Definition 
2.22). Then, for any w € E*, we have 

~5 D {qD,w) = 8 E {qo,w) 



42 


PART II AUTOMATA AND FORMAL LANGUAGES 


Proof. We proceed by induction on |t/;|. If |u;| = 0 (that is, w = e), then we 
have that S E (qo, e) = eclose(qo). But for a DFA, we know that Snip, e ) = P for 
any state p. Thus SniqD,t) = Qd = eclose(go), and so S E {qo, e) = Soiqo, e). 

Now suppose that Soiqo, v) = S E iqo,v) for all v with |u| < n. Let re be 
a word of length n + 1. Then w = x'<7, where cr € E is the final symbol of 
w. By induction, as |x| = n, we have that Soiqo, x) = Soiqo, x). Call this set 
{pi, ... ,p^} C Q n . By dehnition of S for e-NFA’s (Definition 2.20), we compute 
Soiqo, xa) by 

(1) Let {n,...,r m } = \Ji =1 S E iPi,cr). 

(2) Then S E iq 0 ,xa) = U”( =1 eclose(rj). 

Now, we know that S D iq D ,xcr) = S D iS D iq D ,x),a) = Sd{{pi, ■ ■ ■ ,Pk},°)- So 
by definition of the subset construction with e-transitions (Definition 2.22), we 
compute SoiqD,xcr) as follows: 

(1) Let {ri,..., r m } = ULi cr )- 

(2) Then SoiqD, xa) = U'JLi eclose(rj). 

But this is exactly the same set as S E (qo, xa). Thus we have that Sniqn, w) = 
S E (/io, w ), and the induction is complete. □ 

We can now use this to say something about the accepted languages of these 
automata. 

Theorem 2.25 (Equivalence of language in the subset construction with e- 
transitions). 

Let D = (Qn,^,, Sn, qD, Fn) be the DFA constructed from the e-NFA E = 
iQ E , E, S E , qo, F e ) via the subset construction with e-transitions. Then T(D) = 
CiE). 

Proof. From Theorem 2.24, we see that, for any w € S*, 

D accepts iv 4=> SniqD,w ) contains a state in F E 

44 S E iqo,w) contains a state in F E 
44 E accepts w 

Thus T(D) = CiE), as both D and E have the same alphabet E. □ 

Theorem 2.26 (Equivalence of DFA’s and e-NFA’s). 

A language L is accepted by some DFA if and only if it is accepted by some 
e-NFA. 

Proof. We showed in Theorem 2.25 that, from any e-NFA, we can construct a 
DFA on the same alphabet which accepts the same language. 

So now suppose we have that L is accepted by some DFA D = (Q,T*,5n,qo, F). 
Take a transition diagram for D. Then this also defines a transition diagram for 
an e-NFA E with the same states, same alphabet, and same accepting states, 
provided we define S E (q, e) := 0 for all q € Q. We need only remark that the 
transition function S E for E will be given by 

S E iq, cr) = {Sniq, O’)} Vq € Q,a G E 

That is, if Sniq, o') = Pi then S E iq,a) = {p}. Thus the transitions of D and E 
are the same. Moreover, there are no transitions out of any state on e. Thus 
E is genuinely an e-NFA. We prove by induction on |u>| that, if Sniqo,w) = p, 



PART II AUTOMATA AND FORMAL LANGUAGES 


43 


then d E (q 0 ,w) = {p}- 

Basis: Let |w| = 1, so w = a €E S. We have defined 5 E (qo,c r ) = {£d(< 70 ) <r)}, so 
we’re done. 

Induction: Suppose, for all v with |u| < n, we have S E (qo, v) = {S E (qo,v)}. 
Take w with |ie| = n + 1, then w = xa for some x € S*, a € X. Now we have 
that: 

S E (q 0 ,xa) = |J S E (p,a) 

p&5 E (qo,x) 

= U $e(p,ct) 

PS{5d( qo,x)} 

= S E (S D (q 0 ,x),a) 

= {S D (S D (q 0 ,x),a)} 

= {^D{qo,xa)} 

Thus D and E = (Q, S, S E , {go}, F) accept the same words, so C(D) = C(E). 

□ 

Again, as all NFA’s are e-NFA’s, we see that the subset construction in Section 
2.4 has now been made redundant, as we need only do things for e-NFA’s. 

So now we see that, whenever we want to show that a language L is regular, 
it suffices to produce a DFA or an NFA or an e-NFA which accepts L. For 
clarity, we will usually use D to denote a DFA, N to denote an NFA, and E to 
denote an e-NFA (though we may, on occasions, us other letters also). 

2.7. Regular expressions. 

In the previous sections we saw various ‘mechanical’ ways of defining lan¬ 
guages, all of which define the same set of languages. We now give an algebraic 
way to define languages, called regular expressions. Though seemingly differ¬ 
ent to our mechanical definitions of DFA’s, NFA’s, and e-NFA’s, we will show 
that regular expressions define precisely the set of regular languages, hence the 
name. We first need some algebraic operations on languages. 

Definition 2.27 (Operations on languages). 

(1) The union of two languages L and M, denoted L U M, is the set of all 
words which lie in either L or M. 

(2) The concatenation of two languages L and M, denoted LM, is the set of 
all words which are the concatenation of one word in L followed by one 
word in M. We adopt the notation L° := {e}, L 1 := L, L n+1 := L n L 
(the concatenation of n + 1 copies of L). 

(3) The closure of a language L, denoted L*, is the set of all words formed 
by taking a finite number of words in L (possibly with repetition), and 
concatenating them. In particular, this is given by 

L* = J L n 

n> 0 

Observe that 0* = {e}, and {e}* = {e}; these are the only two languages 
whose closure is not infinite. 

We will now describe the algebra of regular expressions. If R is a regular 
expression, then we write C(R) for the language defined by R. 
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Definition 2.28 (Regular expressions). 

A regular expression is any expression built from the following basis set with 
the following inductive rules: 

Basis: 

(1) The constants e and 0 are regular expressions, denoting the languages 
L(e) := {e} and £(0) := 0. 

(2) If a is a symbol, then a is a regular expression denoting the language 
£(a) := {a}. We will always use boldface to denote the expression 
corresponding to the symbol. 

(3) A variable, written as a capital letter such as M, denotes a variable 
representing any language. 

Induction: 

(4) If E. F are regular expressions, then E + F is a regular expression de¬ 
noting the union of their languages; LIE + F ) := C(E) U L(F). 

(5) If E , F are regular expressions, then EF is a regular expression denoting 
the concatenation of their languages; C(EF) := C(E)L(F). 

(6) If E is a regular expression, then E* is a regular expression denoting 
the closure of its language; C(E*) := L(E)*. 

(7) If If is a regular expression, then (E) is a regular expression denoting 
the same language; L((E)) := L(E). This is used to remove ambiguity 
when writing expressions such as (E + F)*, which is different to E + F*. 

Definition 2.29 (Order of precedence). The order of precedence for regular 
expressions is: 

(1) Parentheses ( ) 

(2) Closure * 

(3) Concatenation 

(4) Union + 

For example, the expression 01* + 1 should be read (0(1*)) + 1. 

2.8. Equivalence of DFA’s and regular expressions. 

We can now show that regular expressions give us precisely the set of regular 
languages. First, we show how to construct an e-NFA from a regular expression. 

Theorem 2.30 (Constructing an e-NFA from a regular expression). 

For each regular expression R there is an associated e-NFA E such that L(E) = 
L(R). 

Proof. We will construct an e-NFA E with the following properties: 

(1) Exactly one accepting state. 

(2) No arcs into the initial state. 

(3) No arcs out of the accepting state. 

We build up such an e-NFA inductively, in the same way that we defined regular 
expressions inductively. 

Basis: 

We construct an e-NFA for each of the basis regular expressions. We do this 
for e in Figure 1, 0 in Figure 2, and a (for a some symbol) in Figure 3. There is 
no need for us to label the states, but observe that each e-NFA in Figures 1-3 
satisfy each of the three properties listed above. 
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e 



Figure 1: An e-NFA which accepts the language {e}. 



Figure 2: An e-NFA which accepts the language 0. 



Figure 3: An e-NFA which accepts the language {a}. 


Induction: 

We assume that we are given regular expressions R and S with corresponding 
e-NFA’s which satisfy properties (1), (2), (3) above, and now we show how to 
construct e-NFA’s with the same language as R + S, RS and R* respectively. 
We demonstrate in Figure 4 how we will represent the e-NFA for a regular 
expression R (which satisfies properties (1), (2), (3) above) within a larger e- 
NFA. As each e-NFA for our basis regular expressions has precisely one start 
and one accept state, we take these as the two ‘end points’ of the e-NFA, and 
use these to join them into larger e-NFA’s. All the e-NFA’s that we will now 
construct also have precisely one start and one accept state, so this ‘joining’ 
process will always work. 


OA/VVO 


Figure 4: Representing the e-NFA for the regular expression R within a larger 

e-NFA. 
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An e-NFA for R + S: 

The e-NFA E in Figure 5 gives precisely the same language as R + S. First, 
take a word w £ C(R) U C(S). If we start at the start state of E, then we can 
simultaneously e-transit to the beginning of the e-NFA for R , and the beginning 
of the e-NFA for S. As w lies in the language of either R or S, then after reading 
all of w we will eventually reach the accept state of that respective e-NFA (or 
both). Then we can e-transit to the accept state of E. 

Conversely, suppose w is accepted by E. Then, starting from the start state of 
E, we e-transit to the start state of the e-NFA’s of R and S, simultaneously. 
Then we will reach the accept state of one of the e-NFA’s of R and S, and in 
doing so we must read all of w. There are no more symbols of w to read, and 
so we take the final e-transition to the accept state of E, and so w is accepted 
by E. 

Thus the language of E is C{R) U C{S). 

An e-NFA for RS: 

The e-NFA E in Figure 6 gives precisely the same language as RS. First, take 
a word w € C(R) C(S), so w = uv for some u £ £{R), v € C(S). If we start at 
the start state of E, then we can e-transit to the start state of the e-NFA for 
R. After reading all of u we will eventually reach the accept state of the e-NFA 
for R. Then we can e-transit to the start state of the e-NFA for S, with v still 
to read. But after reading all of v we will eventually reach the accept state of 
the e-NFA for S, which is the accept state of E. 

Conversely, suppose w is accepted by E. Then, starting from the start state 
of N, we e-transit to the start state of the e-NFA for R. Then, to reach the 
accept state of this, we first have that w has some prefix u £ C(R) which we 
read and get to the accept state of R. Then we e-transit to the start state of 
the e-NFA for S. But to reach the accept state of the e-NFA for S (that is, the 
accept state of N ), we must have that the entire remaining suffix v of w lies in 
C(S). That is, w = uv for some u £ C(R), v € C(S). 

Thus the language of E is C(R) C(S). 

An e-NFA for R*: 

The e-NFA N in Figure 7 gives precisely the same language as R*. First, take 
a word w £ C(R*). If w = e then we can e-transit directly to the accept state 
of E. If w 7 ^ e, then we have that w = u\ ... u m for some m > 0 and some 
collection of non-empty Ui £ R. So start at the start state of E, then e-transit 
to start state of the e-NFA for R. After reading all of u\ we will eventually 
reach the accept state of the e-NFA for R. Then we can e-transit back to the 
start state of the e-NFA for R, with U 2 ■ ■ ■ u m still to read. Repeat this process 
a total of m times; then we’re left at the accept state of the e-NFA for R with 
nothing left to read, and thus can e-transit to the accept state of E. 
Conversely, suppose w is accepted by E. Then, starting from the start state of 
E, we either e-transit to the accept state of E and are accepted (thus giving 
w = e £ C(R*)), or we e-transit to the start state of the e-NFA for R. Then, to 
reach the accept state of the e-NFA for R, we must have that w has some prefix 
u\ £ C(R) (so w = u\v). So we read such a prefix, and then reach the accept 
state of the e-NFA for R with the remaining suffix left to read. If this suffix v 
is empty we are done (we e-transit to the accept state of E and are accepted). 
Otherwise, the only other option is to e-transit back to the start state of the 
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e-NFA for R, with v still to read. We keep repeating this process, and the only 
way for w to be accepted by E is if, after m such cycles, we have reached the 
accept state of the e-NFA for R and have nothing left to read (so that we can 
e-transit to the accept state of E), and thus w = u\... u m for U{ € C(R), and 
so w € C(R*). 

Thus the language of E is C(R*). 


R 



S 

Figure 5: An e-NFA which accepts the language C(R) + C(S). 

R S 



Figure 6: An e-NFA which accepts the language C(R)C(S). 


R 



e 


Figure 7: An e-NFA which accepts the language C(R)*. 

All the e-NFA’s we constructed in Figures 5-7 satisfy properties (1), (2), (3) 
(assuming the e-NFA’s for R and S also satisfy these). Thus our induction is 
complete; we can build up an e-NFA for any regular expression. □ 

We now show how to construct a regular expression from a DFA. 
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Theorem 2.31 (Constructing a regular expression from a DFA). 

For each DFA D there is an associated regular expression R such that C(D) = 
C{R). 


Proof. For each DFA D = (Q,T,,5,qo, F), we construct a regular expression R 
as follows: 

First we number the states of D by {1,... ,n} (this re-naming will be helpful 
later). Now we will inductively define regular expressions R^ (1 < i,j < n, 
0 < k < n) whose languages are the words w which begin at state i and end at 
state j without passing through any intermediate state whose number is more 
than k (think of w as giving a path in a transition diagram for D). Note that 
the beginning and end points are not intermediate, so we may have i and/or j 
being greater than k. 

We first consider the case k = 0. Note that every state is numbered greater 
than 0, so R^ is only paths from i to j with no intermediate states. That is, 
direct edges from i to j. If i j, then this will be single edges from i to j. If 

i = j, then this is the path of length 0 (that is, e), as well as all loops from i to 

itself. So let {ai,..., ai} be all the symbols labelling arcs from i to j (that is, 

all the symbols a € £ for which 5(i,a ) = j). Then we define 


For i + j, R^ := 


&x -)-... T ui if Z > 0 
0 if l = 0 


For i = 


4” 


ai + ... + ai + e if l > 0 
e if l = 0 


We have added the e in the case i = j to cover the situation where we have the 
path of length 0 from i to itself. Thus we have a regular expression for R,f^ 
which gives us the language we desire. 

(k) 

Now we must inductively define the expression R'- , assuming we have de¬ 
fined /?■*'* for all t < k and all i, j. So, suppose we have a path from state i to 
state j that does not pass through any intermediate state higher than k. This 
falls in to one of two cases: 


Case 1. The path does not pass through state k at all, in which case the path 
gives a word which lies in the language of R^ 1 ^. 

Case 2. The path passes through state k (as an intermediate state) some num¬ 
ber of times. In this case, we can break up the path into three pieces: 

Piece 1. A ( nonempty) path from state i to state k that does not pass through 
state k in any intermediate step. This will lie in the language of R\ ; . 

Piece 2. A (possibly empty) path from state k to state k which does not pass 
through state k in any intermediate step, followed by another such path, and 
so on (finitely many times). These sub-pieces will lie in the language of /?[! 
and so the entire piece will be concatenations of these, and thus lie in the lan¬ 
guage of {R^)*■ 

Piece 3. A ( nonempty) path from state k to state j that does not pass through 
state k in any intermediate step. This will lie in the language of Rf- ; . 

So in case 2, our word lies in the concatenation of the languages from piece 1, 
then piece 2, then piece 3. That is, it lies in the language of R^ ^{R^ ^)*^kj ^ 
Adding the regular expressions from the two cases (that is, taking the union of 
the two languages), we have the regular expression ^PR^ ^ ^ 
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whose language contains all words w which begin at state i and end at state j 
without passing through any intermediate state whose number is more than k. 
But clearly this defines all such words, as its language is no more than these 
words. Thus we define the regular expression 

rj(fc) E>(k — 1) , E>(fc — 1) / p(k — 1)\* E>(^ — 1) 

K ij — K ij + K ik { K kk ) K kj 

for the language of all words representing paths from state i to state j which 
do not pass through any intermediate state higher than k. 

Now, given that we only have n states in total, we see that the language 
of R)j' will be all words representing paths which start at state i and end 
at state j. Now assume that state 1 is the start state (our initial numbering 
was arbitrary, so we could have easily defined it that way). Take the sum of 
all expressions of the form R\- for j an accepting state, and the associated 
language will be the language of D. That is, we have just proved that 

C(D) = C «> + ... + f?g) where F = {ji,.. .,j s } 

□ 

So now we see that, whenever we want to show that a language L is regular, 
it suffices to produce a DFA or an NFA or an e-NFA or a regular expression 
whose associated language is L. For clarity, we will usually use D to denote a 
DFA, N to denote an NFA, E to denote an e-NFA, and R to denote a regular 
expression (though we may, on occasions, us other letters). 

Having all these equivalent definitions at our disposal may seem confusing at 
first. But ultimately it is quite helpful, as there are various languages which are 
very easily shown to be regular with one definition, but not the others. Also, 
keep in mind that it is not always obvious which definition will be the easiest 
to use to show that a language is regular. 

2.9. Closure properties of regular languages. 

We will now use the various definitions of regular languages to show some 
closure properties. 

Theorem 2.32 (Closure under union). 

Let L,M be regular languages over X, T respectively. Then L U M is a regular 
language over SUT. 

Proof. Let Rl, Rm be regular expressions for L, M respectively. Then Rl + Rm 
is a regular expression, and C ( Rl + Rm) = C(Rl) U C(Rm) = L U M. □ 

Theorem 2.33 (Closure under complementation). 

Let L be a regular language over S. Then the complement of L, L := E* \ L, is 
a regular language over S. 

Proof. Let D = (Q,T,,8,qo,F) be a DFA such that L = C(D). Now define a 
new DFA D := ( Q , S, 8, qo, Q\ F). Notice that, for any w € £*, we have 

w eL^w eE*\L 
w £ L 

O S(qo,w) F 
8(qo, w) G Q\F 
4=> w € C(D) 
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Observe that the above proof only works because D and D are deterministic 
(if not, then we could not conclude that 5(qo,w) ^ F <=> 5(qo, w) £ Q\F). Thus 
the above argument would not work (as written) if we were using an NFA to 
describe L. This is an example of where one description of regular languages is 
more convenient than another. 


Theorem 2.34 (Closure under concatenation). 

Let L, M be regular languages over X, T respectively. Then LM is a regular 
language over SUT. 

Proof. Let Rl,Rm be regular expressions for L,M respectively. Then R e Rm 
is a regular expression, and C (. Rl + Rm ) = £-{Rl) £{Rm) = LAI. □ 

Theorem 2.35 (Closure under closure operator). 

Let L be a regular language over X. Then L* is a regular language over X. 

Proof. Let R be a regular expression for L. Then (R)* is a regular expression, 
and C ({R)*) = C{Rf = L*. □ 

Theorem 2.36 (Closure under intersection). 

Let L,M be regular languages over X,T respectively. Then Lfl M is a regular 
language over SflT. 

Proof. First observe that L = X*\L and M = T*\M are both regular over X, T 
respectively (by Theorem 2.33). So LUAf is regular over (XUT)* (by Theorem 
2.32). Thus L U M is also regular over (X U T)* (by Theorem 2.33). But by 
DeMorgan’s laws, we have that LnM = L U M (taking the final complement in 
(XUT)*), so LHM is regular over XUT. But clearly LdM C X*nr* = (XnT)*, 
so L n M is a regular language over UnT. □ 


Definition 2.37. Let o\ ... a m € X* be some word. We define its reverse by 

( \R 

(or . . . UmJ := (T m . . . <7 i 

(that is, the word a\ ... a m written in reverse). If L is a language, we define 
the reverse of L, L R , to be 

L r ■= {v R | v € L} 

(that is, the language consisting of the reverses of all the words in L). 
Theorem 2.38. Let L be a regular language. Then L R is also regular. 


Proof. Take a DFA D = (Qd, X, Sd, qo , Fd) for which C(D) = L. From this we 
construct a ‘reverse e-NFA’ E from D with 


(1) States Qe ■= Qd U { z } (same as D, with one extra state z added). 

(2) Alphabet X (same as D). 

(3) Start state z. 

(4) Accept states Fe := {(/o}; the start state of D. 

(5) Transition function: 6 e ■ Qe x (X U {e}) —>• V(Qe) given by 


f>E(q,e) 


F d if q = z 
0 \iq + z 


5 E {q, cr) := {p e Qd \ $d(p, (?) = q} 
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That is, we reverse all existing transitions in D, then add a new state z, and 
finally add e-transitions from z to all states in F. Thus w € T(D) w R € 
C(E), so L R is regular. □ 

2.10. The pumping lemma and non-regular languages. 

Now that we have a long list of ways to show that languages are regular, it is 
time to develop some more tools, this time to show that certain languages are 
not regular. 

Suppose we have a DFA D = (■ Q , X, 5, qo, F), and a word w which is accepted 
by D. Suppose moreover that |w;| > \Q\, that is, w has more symbols than 
D has states. Consider the transition diagram T^ of D. By the pigeonhole 
principle, we must have that w defines a path in T £> which visits the same 
state twice. So we can break up w into 3 subwords w = xyz, with y defining a 
non-empty path from some state q back to itself. So y gives us a ‘loop’ in the 
transition diagram. Now consider what would happen if we were to remove y 
from w, to have the word xz. Or if we were to ‘do y again’, to have the word 
xyyz. Would these new words be accepted by D1 

Theorem 2.39 (The pumping lemma for regular languages). 

Let L be a regular language. Then there exists a constant n (depending on L) 
such that for every word w € L with |u>| > n we can break up w into 3 words 
w = xyz such that: 

(1) V + e- 

(2) \xy\ < n. 

(3) For all k > 0, we have that the word xy k z is also in L. 

This theorem is named the pumping lemma because for each word w of suffi¬ 
cient length we can find a subword y which we can pump ; that is, we can repeat 
y as many times as we like. 

Proof. As L is regular, we have that L = C(D) for some DFA D = ( Q , E, 6, qo, F). 
Suppose that D has n states (|<2| = n). Now, take any accepted word w € L 
with |tc| > n, say w = or ... a m ; m > n, where each oj € X. Let pi be the 
state that D is in after reading the subword or ... (7* (1 < i < to ). That is, 
Pi ■= 5(qo,cri ... ai). Define p 0 := S(q 0 , e) = q 0 . 

By the pigeonhole principle, the n + 1 pfs {po, ■ ■ ■ ,p n } must have some re¬ 
peated state, as there are only n different states of D. So we can find two 
integers 0 < l < r < n with pi = p r . So we now break up w as w = xyz, where 

x = o\.. .a i (or e if l = 0) 
y = <7; + i ... oy (and so \xy\ < n and y e) 
z = oy + i ... a m (or e if r = n) 

That is, x traces a path from po to pi, y traces a loop from pi back to itself 
(as pi = p r ), and z takes us to some accepting state qt as w is an accepted word. 
Note that x and/or z are permitted to be empty, but by definition we have that 
y is non-empty (as l < r). Now consider what happens when we input xy k z 
into D: 

If k = 0, then we go from qo (which is po) to p/ on a path traced by x. We 
then go from p r (which is pf) to the accepting state qt (which is p rn ) on a path 
traced by z. Thus xz traces a path from qo to the accept state qt (the same 
accept state that w ends at), and so xz € L. 
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If k > 1, then we go from go (which is po) to pi on a path traced by x. We 
then loop from pi back to itself k times on a path traced by y (recalling that 
Pi — Pr)- We then go from p r to the accepting state qt (which is p rn ) on a path 
traced by z. Thus xy k z traces a path from go to the accept state qt (the same 
accept state that w ends at), and so xy k z € L. □ 

Here is a typical way of using the pumping lemma to show that a language 
is not regular: 

Example 2 . 40 . The language L = {0”l n | n > 1}, of all words consisting of 
some number of 0 ’s followed by the same number of 1 ’s, is not regular. 

Proof. Suppose L were regular; we proceed by contradiction using the pumping 
lemma. If L were regular, then we would have some constant N satisfying the 
hypotheses of the pumping lemma. So consider the word w = 0 N 1 N . Then 
w € L. Moreover, by the pumping lemma, we can break up w = xyz such that: 

1. t//e. 

2. \xy\ < N. 

3. For all k > 0, we have that the word xy k z is also in L. 

As \xy\ < N, we must have that xy = 0 m for some m < N (as the first N 
symbols in w are all 0’s), and moreover that y = 0 1 for some 0 < l < m < N. 
Thus, by the pumping lemma, we must have that xz € L. But xz = 0 N ~ l l N , 
which is not in L as N — l ^ N. □ 

We could have also argued that, by the pumping lemma, we must have that 
xy 2 z € L. But xy 2 z = 0 N+l l L , which is again not in L. 

The pumping lemma will be the usual tool we will use to show certain lan¬ 
guages are not regular. The standard technique for this is: 

(1) Take a language L and assume that it is regular. 

(2) Suppose there is an N which satisfies the hypothesis of the pumping 
lemma. 

(3) Choose a word w in L and suppose it has a decomposition w = xyz as 
per the pumping lemma. 

(4) For any such decomposition of w as above, show that there is a suitable 
power k > 0 of y such that xy k z L. 

2.11. Equivalence relations and minimisation of DFA’s. 

We now describe a way to find a minimal version of any given DFA D; that 
is, a DFA D' which accepts the same language but has the smallest possible 
number of states. The underlying idea is that we take our original DFA D and 
group together states which are equivalent. 

Definition 2.41 (State equivalence in DFA’s). 

Let D = (Q, E, <5, go, F) be a DFA. We call two states p,q G Q equivalent or 
indistinguishable if, for all w € E*, we have that 8(p,w) € F if and only if 
<5(g, w) € F. We write this as p ~ g. That is, 

p ~ g if and only if Vic € T*(S(p. w) € F 44 <5(q,w) € F) 

If two states p, q are not equivalent, then we say they are distinguishable, and 
we say that they are distinguished by w if one of 8(p,w), 5(q,w) is accepting 
but the other is not. 
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We do not require S(p, w ) = 5(q, w) for all words w in order for p, q to be equiv¬ 
alent, just that they are always either both accepting or both non-accepting. 

Lemma 2.42. 

The relation ~ on states given in Definition 2.41 is an equivalence relation. In 
other words, it is: 

(1) Reflexive: p ~ p \/p € Q 

(2) Symmetric: if p ~ q then q ~ p 

(3) Transitive: if p ~ q and q ~ r, then p ~ r. 

Thus we can define the equivalence class of a state p, written \p\, as all the 
states equivalent to p. That is, 

\p\ := {q € Q I q ~ p} 

This gives a partition of Q into disjoint equivalence classes. 

Proof. 1 and 2 are immediate. 

To prove 3: Let w be a word on the alphabet of the DFA. If p ~ q then 
S(p,w),5(q,w) are either both accepting or both non-accepting. Similarly, if 
q ~ r then 6(q,w),6(r,w) are either both accepting or both non-accepting. 
Thus 5(p,w),5(r,w ) are either both accepting or both non-accepting, and so 
p ~ r. □ 

Definition 2.43 (The table-filling algorithm for DFA’s). 

We describe the following algorithm, called the table-filling algorithm , for de¬ 
termining which states of a DFA D = ( Q , S, 5 , qo, F ) are distinguishable. 

We begin by drawing a table T, with the rows and columns indexed by ele¬ 
ments of Q. The point is to mark, in entry with coordinates ( p,q ), whether p 
and q are distinguishable or not. As this is a reflexive property, we need only 
consider the lower-left triangle of the table. We start with all entries being 
‘empty’, and mark later entries by the following inductive process. 

Basis: 

We place a mark ‘x’ in every entry labelled by a pair of states (p, q) with one 
of p, q accepting and the other non-accepting. These states are distinguishable; 
e will distinguish them. 

Inductive step: 

Take the table T at the current point in the algorithm. Take any pair of states 
p, q where entry (p, q) is unmarked. If there is some symbol a with 5(p, a) = r 
and 5(q,a) = s with (r,s) already marked in T (which corresponds to r, s be¬ 
ing distinguished states in D ), then we know that p and q are distinguished 
states. This is because r, s are distinguished (say by w), and thus precisely one 
of 5(r,w),5(s,w ) is accepting. But S(p,aw) = S(r,w), and 6(q,aw) = S(s,w), 
and thus aw distinguishes p and q. So we place a new mark ‘x’ at entry ( p,q ). 
Now repeat the inductive step. 

Conclusion: 

If we have filled the table T sufficiently so that, for every pair of states p, q where 
entry (p,q) is unmarked, there is no symbol a with S(p,a) = r and 5(q,a) = s 
with (r, s ) already marked in T, then our algorithm halts. 

Theorem 2.44 (Proof of the table-filling algorithm). 

Let D be a DFA. Then two states p,q correspond to a marked entry in the 
table-filling algorithm if and only if they are distinguished states in D. 



54 


PART II AUTOMATA AND FORMAL LANGUAGES 


Proof. Let D = (■ Q,T,,6,qo,F ), and T the table at the end of the table-filling 
algorithm. Clearly, if entry (p, q ) is marked in T, then p and q are distinguished 
states in D. This is because through the algorithm we can inductively construct 
a distinguishing word for p and q. 

We call a pair of states p, q a bad pair if (p, q) is unmarked in T but p, q 
are distinguished states in D. We will proceed by contradiction to show there 
are no bad pairs. So, assume a bad pair exists. Now, over all possible bad 
pairs, take one such pair p. q with the shortest possible distinguishing element 
w = (j\ ... o n (that is, if p',q' are a bad pair distinguished by word w', then 
\w'\ > |tt>|). So precisely one of 5(p,w),5(q,w) is accepting. 

Clearly, w is not e, for if it were then ( p , q) would be marked in the basis step 
of the table-filling algorithm, and thus not be a bad pair. So n > 1. 

Now consider the states r = 5(p,a\) and s = 5(q,ai). Thus r, s are distin¬ 
guished by the word <72 ... a n which is of length n — 1, and so by the minimality 
of n we have that r, s is not a bad pair. Thus the entry (r, s ) is marked in T. 
But once entry (r, s ) is marked, the table-filling algorithm will then eventually 
mark entry (p,q), as 6(p,a 1 ) = r and <$((/, 02 ) = s, with entry (r, s) already 
marked. 

Thus there are no bad pairs, and so every pair of states corresponding to an 
unmarked entry in T are actually indistinguishable in D. □ 

Part of the reason for studying equivalence of states is to take a DFA D and 
‘group together’ the equivalent state, to construct a new DFA with fewer states 
but which accepts the same language. We do that now. 

Lemma 2.45. Letp,q be two equivalent states of a DFA D = (Q, S, 5, qo, F), 
and take any Then 5(p, a), 5(q, a) are equivalent. 

Proof. Suppose 5(p,a) 00 5(q,a). Then they are distinguished by some word 
w. But then p, q would be distinguished by the word aw; a contradiction since 
p ~ q. □ 

So we see that if we start with any two equivalent states p, q, then the tran¬ 
sition function 5 must take them to equivalent states for every symbol a G X. 

Lemma 2.46. Letp,q be two equivalent states of a DFA D = ( Q,T,, 5, qo, F ). 
Then p is accepting if and only if q is accepting (p € F q € F). 

Proof. If one of p, q were accepting but the other not, then the word e would 
distinguish them. □ 

Definition 2.47 (DFA minimisation). 

Given a DFA D = (Q,T,,5,qo,F), we define the minimal DFA for D, written 
D/ as the DFA with: 

(1) Alphabet £ (the same as D). 

(2) States Q' := {[p] | p € Q}. 

(3) Transition function 5' defined by S'(\p],a) := [5(p, < 7 )]. 

(4) Start state q' Q := [go]- 

(5) Accepting states F' := {[p] | p e F}. 

By Lemma 2.45, we have that 5' is indeed well-defined. That is, to decide 
where to send the state [p\ when reading symbol a, we just need to pick out one 
state in [p] (p itself will suffice), and see which equivalence class 5(q,a ) lies in. 
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Lemma 2.45 ensures that this always gives us precisely one equivalence class: 
[5(p,a)\. 

Observe that, by Lemma 2.46, the equivalence relation ~ partitions F into 
disjoint sets which do not intersect Q \ F. So the states of D/ ~ consist 
of collections of (equivalent) states which are either all accepting or all non¬ 
accepting. 

Lemma 2.48. Let w be a word in a DFA D, q a state in D, and take D/ ~ 
as constructed above. Then 

5'{\p\,w) = [5(p,w )] 

Proof. We induct on |u>|. Clearly, if |rc| = 0 (i.e., w = e), then we have 

<5'(bL e ) = \p\ = [<5(ih e )]- 

Now suppose h'([p])' u ) = [^(p,v)] for all v with |u| < n, for some hxed n. Take 
w with |a;| = n + 1. Then w = ua for some u with |rt| = n. So 

S'(\p\,ua) = 5' (S'(\p],u),a) (definition of S') 

= S'([S(p, -u )]) a ) (induction, as |ti| = n) 

= [5(<5(p, it)], <t)] (dehnition of S') 

= [S(p,ua)] (definition of S) 

□ 

Thus the extended transition function S' works as a natural extension of the 
extended transition function S. 

Theorem 2.49 (Equivalence of languages of minimal DFA’s). 

Let D be a DFA, and D/ ~ be its minimal DFA. Then C(D / ~) = C{D). 

Proof. Observe that D and D/ ~ have the same alphabet (call it E). So take 
any word w € E*. Then: 

w £ C(D / ~) 44 S'(q o, w) € F' (definition of acceptance) 

44 J'Qgo], w) £ F' (dehnition of q' 0 ) 

44 [h((/o,w ; )] € F' (Lemma 2.48) 

44 S(qo, w) £ F (Lemma 2.46) 

44 re £ C(D) (dehnition of acceptance) 

□ 

Lemma 2.50. Let D be a DFA, and D' = D/ ~ be its minimal DFA. Then 
no two states of D / ~ are equivalent. 

Proof. Suppose [p] ~ [q] in D'. Then, for every word w, we have that S'(\p],w), 
S'([q], w) are either both accepting or both non-accepting in D'. Thus, [S(p, re)], 
[S(q, u:)] are either both accepting or both non-accepting in D' (Lemma 2.48). 
So then S(p,w),S(q,w ) are either both accepting or both non-accepting in D 
(Lemma 2.46). So p ~ q in D, and thus [p] = [q] in D'. □ 

Definition 2.51. Let A, B be DFA’s. We say A and B are equivalent , writ¬ 
ten A = B, if, up to possible re-labelling of states, they have same alphabet, 
transition function, set of states, start state, and accept states. 
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So equivalent DFA’s are ‘functionally’ identical; they only differ on state 
names (which are only arbitrary labels). Given that we can count how many 
states there are in a DFA, and moreover that there are only finitely many ways 
to permute state names, we see that we can algorithmically compute whether 
two DFA’s are equivalent or not. 

Corollary 2.52. Let D be a DFA. Then (D/ ~)/ ~ = D/ ~ as DFA’s. 

Proof. Performing DFA minimisation on a DFA gives a new DFA whose states 
are a partitioning of the states of the old DFA into disjoint equivalence classes. 
But no two distinct states in D/ ~ are equivalent (Lemma 2.50). So performing 
DFA minimisation onL/~ partitions its states into sets of size 1, which gives 
the exact same set of states (up to re-labelling). Thus the alphabet, transition 
function, and states remain unchanged (modulo this state re-labelling). □ 

Theorem 2.53 (Removing inaccessible states from a DFA). 

There is an algorithm which takes a DFA D = ( Qo,T,,5D,qo,Fo ) and produces 
a DFA A = {Qa, E, 5a, qo, Fa) with no inaccessible states, for which. C(A) = 
C{D). 

Proof. Let n = \Qd\- Then form the sequence S) of subsets of Qo, 1 < i < n, 
via 

So ■= {<7o} 

Si+i : = 1J ( U S D(q,v)) 

Q^Si 

So Si + i is the set of states which can be reached from S) with one transition 
step. At most n steps are needed to reach any state, and so S n will be the set 
of all accessible states of D. So define Qa '■= Qd 0 S n , Fa := Fo FI S n , and 5a 
as the restriction of do to Qa xS. □ 

We now prove that minimal DFA’s uniquely define regular languages, up to 
re-naming of states. 

Theorem 2.54 (Minimality of minimal DFA’s). 

Let D be a DFA with no inaccessible states, and suppose that A is another DFA 
on the same alphabet as D and for which C{D) = C{A). Then A has at least 
as many states as D / 

Moreover, if A has the same number of states as D/ then A = D/ 

Proof. Suppose that A has fewer states than D' := Dj ~ ; we proceed by 
contradiction. Take some DFA B on the same alphabet as D with the least 
number of states for which C{B) = C{D). Now form the ‘disjoint union’ of the 
DFA’s D',B as follows: With B = (S,Y,,6,po,G), and D' = {Q', S, 5 1 , q' 0 , F 1 ) 
respectively, form a new DFA U := ((5 , US', S, p, q' 0 , GUF'), where the transition 
function p is defined via 5 and S'. (We need to mark the states of B by an 
overline, to ensure that they are disjoint from those of D' , so that Q' U S are all 
unique.) Picture this as taking the transition diagrams for D',B and drawing 
them next to each other to get a new transition diagram (we take q' 0 as our 
start state, but it doesn’t really matter). 

Now run the table-hlling algorithm on this disjoint union U, and observe that 
the start states of D' and B are equivalent as C(D') = C(D) = C(B) (Theorem 
2.49). Observe that if p, q are equivalent states (in any DFA), then by Lemma 
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2.45 their successors 5(p,a),S(q,a) are also equivalent for any symbol a. By 
induction, 5(p, w), 5(q , w) are also equivalent for any word w. 

Since D has no inaccessible states, then neither does D'. To see this, take a 
state [g] in D' ; there is some word w with Soiqo, w) = q as D has no inaccessible 
states, and thus <5 d'([<7o]> w ) = [£d(<Zo> to)] = [<?]• Moreover, B has no inaccessible 
states, or else we could form a new DFA C with the inaccessible states of B 
removed, for which C(C) = C(B) (Theorem 2.53); this would contradict the 
minimality of B. 

So far we have: 

• The start states of D' and B are equivalent in their disjoint union DFA JJ. 

• The successors of any pair of equivalent states in U are again equivalent. 

• Neither D' nor B have inaccessible states. 

Thus every state of D' is equivalent to some state of B. To see this, let p be 
some state of D'. Then, as D' has no inaccessible states, there is some word 
or ... a m which gives a path from the start state of D' to p. But now the same 
word gives a path from the start state of B to some state q in B (as B is on 
the same alphabet as D'), and these states are thus equivalent via Lemma 2.45 
(as the start states of D' and B are equivalent, since C(D') = C(B)). 

Now, since B has fewer states than D' , then by the pigeonhole principle there 
must be two different states of D' which are equivalent to the same state of B. 
Thus these two states of D' are equivalent to each other. But two equivalent 
states of D' must be the same state (Lemma 2.50); a contradiction. So A must 
have at least as many states as D'. 

If A has the same number of states as D', then again we see that it cannot 
have any inaccessible states (and neither does D'). Also, neither A nor D' 
can have any equivalent states, by the first part of the theorem. Thus, in the 
disjoint union DFA of A and D' (as defined above), each state of A is equivalent 
to precisely one state of D 1 , and vice-versa. As pairs of equivalent states in a 
DFA are preserved under the transition function, we have that each state of 
A ‘matches’ precisely one state of D' (same symbols transitioning in, from 
matching states; same symbols transitioning out, to matching states). □ 

Theorem 2.55 (Testing equivalence of regular languages). 

There is an algorithm that, on input of DFA’s D\,D 2 , determines whether or 
not they define the same regular language. 

Proof. Let D\ = (Qi, Si, 5i, gi j0 , F\), D 2 = (Q 2 , £ 2 , 82 , 92 , 0 , F 2 ) respectively. 
Set Li := C{Di) for * = 1,2. Now replace D\ with a new DFA A\ with: 

• Alphabet Si U S 2 . 

• States Q\ U {z±} (some symbol z\ disjoint from Q 1 ). 

• Start state gqo- 

• Accept states F \. 

• Transition function p\ which extends <5i by defining pi(q,a) := z\ for all 
q € Q\ and all a E S 2 — Si, and p\{z\,a) := z\ for all cr G Si U S 2 . 

Replace D 2 with a new DFA A 2 in an analogous manner, interchanging the 
subscripts 1 and 2. Then remove the inaccessible states of each A* (Theorem 
2.53), and call the resulting DFA Bi. It is then immediate that C(Bi) = C{Ai) = 
C{Di) = Li for i = 1, 2. 

Now form B\j ~ and B 2 / and compute if they are equivalent. As they 
are on the same alphabet, then by Theorem 2.54 this occurs if and only if 
L\ = L 2 ■ □ 
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3. PUSHDOWN AUTOMATA AND CONTEXT-FREE LANGUAGES 

Having dealt with finite-state automata, we saw that they were very straight¬ 
forward to work with, but were highly limited in the languages they could 
recognise. The reason for this is that they have a bounded memory, even 
though they can recognise arbitrarily long words, in a sense they are only see¬ 
ing a ‘finite number of options’. An example of this is computing the remainder 
n mod m, where we only need to keep track of the remainder as we read along 
the input word, and this can only take one of a finite number of values. 

We now describe a slightly more general finite-state machine; one with an 
unbounded ‘memory stack’, which can compute not only languages, but also 
sentences with some form of structure. This new form of computation is still 
weaker than register machines, but can do more than DFA’s. 

3.1. Context-free grammars and context-free languages. 

We will start in the reverse order this time, and will define context-free gram¬ 
mars (akin to regular expressions) as a way to generate languages algebraically. 
Later, we will show that these give the same languages as a more general finite- 
state machine, known as a pushdown automaton. 

Definition 3.1 (Context-free grammar). 

We define a context-free grammar (CFG) to be a quadruple (TV, X, P, S), with 

(1) A finite set of nonterminal symbols TV. 

(2) A finite set of terminal symbols X, disjoint from TV. 

(3) A finite set of productions P C TV x (TV U X)*. 

(4) A start symbol S £ TV. 

We will often use capital letters A, B,C,... for nonterminal symbols, and 
lower case letters a,b,c,... for terminal symbols. Words in (TV U X)* will often 
be written with Greek letters o, /?, 7 ,.... 

We will often write productions (A, o) as A -A a, to emphasise that there is 
some sort of substitution occurring here. For convenience, we will collect to¬ 
gether productions with the same first (nonterminal) symbol, and use a vertical 
bar | to separate all the words associated to that nonterminal. For example, if 
we had productions (A, or), (A, a 2 ), (A, 03 ), we would write this as 

A —y Q.i | 02 | CD 

Example 3 . 2 . Here is a small example of a CFG: 

N = {5} 

X = {a, b} 

P = {(S,aSb),(S,e)} 

In later cases we would write P as P = {5 -A aSb \ e}, according to our 
convention above. 

Later, we will see what this example actually represents. 

Definition 3.3 (CFG terminology). 

Let G = (TV, X, P, S ) be a CFG. 

(1) Let o, (3 € (TVU X)*. We say that (5 is derivable from a in one step if /3 
can be obtained from a by replacing some nonterminal A occurring in 
a with 7 € (TV U X)*, where (A, 7 ) G P. That is, there exist 07,07 £ 
(TV U X)* and production A -A 7 such that a = oiAo2 and j 3 = 01702 
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We write this as 

a —U- (3 

G 

(indicating that, in the CFG G , we require one such substitution to go 
from a to (3). 

(2) For a, /3,-y G (IV U X)*, we inductively define the following notation: 


a —^7 a for any a. 

G 

a ^-4- B if there exists 7 with a —^7 7 and 7 —U- B. 
G G G 

a —^7 B if there exists n > 0 with a —7 B. 

G G 


(3) If a -^7 (3 then we say that (3 is derivable from a. 

(4) If f3 is derivable from a , then a derivation (of length n) of f3 from a is 
a sequence of n steps from a to /3. That is, a sequence 71 ,..., 7 n _i € 
(N U X)* such that 


a 





Tn— 1 



P 


(5) A word w G (IV U X)* which is derivable from the start symbol S is said 
to be in sentential form. 

( 6 ) A word w G ( N U X)* in sentential form is said to be a sentence if it 
contains only terminal symbols. That is, if iv G X*. 


Definition 3.4 (Language of a CFG). 

Let G = (IV, X, P, S ) be a CFG. The language generated by G , C(G ), is the set 
of all sentences derivable by G. That is, 

C(G) :={wGX’|5Aw} 

G 

A language L is a context-free language (CFL) if L = C{G) for some CFG G. 

By keeping our convention of writing nonterminals in uppercase (with start 
symbol always given by S ), and terminals in lowercase, we can fully describe a 
CFL from the productions P of a CFG describing it. That is, we don’t need to 
specify IV, X, S explicitly 15 . For example, we can describe the language of the 
CFG from Example 3.2 simply by writing the productions, which are 

S^aSb | e 

The only terminals that can appear in a CFL are terminals appearing in the 
productions, so there is no need to explicitly give the finite alphabet X. 


Example 3.5. The set X = {a n b n \ n > 0} is a CFL, generated by the CFG 
G from Example 3.2. That is, X is generated by the grammar 

S^aSb | e 

To see that X C C(G), we induct on n to show that 

S a n b n 
G 

Conversely, an induction on the length of derivations shows that C{G) C X . 


15 


This bypasses unused nonterminals and terminals, but we get the same language. 
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Note that this language is not a regular language (Example 2.40), and so we 
immediately see that CFL’s can be non-regular. On example sheet 4 we will 
see that every regular language is a CFL, and thus there is a strict inclusion 
(regular language => CFL) of these types of languages. 

Example 3.6. The set X = {w € {a, b}* \ w = w R } is a CFL, generated by 
the CFG G defined by 

S -A aSa | bSb \ a \ b \ e 

The first two productions S -+ aSa \ bSb give the symmetry of X when read 
from the left and right sides simultaneously. The next two productions S -+ a | b 
‘finish off ’ palindromes of odd length , and the final production S -+ e ‘finishes 
off ’ palindromes of even length. 


Now we look at a more involved example, which is very important in real- 
world computation 16 . This is the set of balanced parentheses; strings of [’s and 
]’s which obey certain containment rules. 

Definition 3.7 (Balanced parentheses). 

Take any string + £{[,]}*• We define 

(1) L(x) : = jf[(x) = the number of left parentheses [ occurring in x. 

(2) R{x) := ff\(x) = the number of right parentheses ] occurring in x. 

We say that x is balanced if 

(i) L(x) = R{x). 

(ii) For all prefixes y of x , we have that L(y ) > R(y). 

Lemma 3.8 (A CFG for balanced parentheses). 

Let G be the CFG 

S^[S}\SS\e 

Then C(G) = {+ € {[,]}* | x is balanced}, and so this is a CFL. 


A quick application of the pumping lemma for regular languages shows that 
the language of balanced parentheses is not regular. Later, we will see another 
version of the pumping lemma, this time for CFL’s. 

To show that x € T(G) => x is balanced is an induction on the length of the 
shortest possible derivation for x. To show x is balanced =+• x € C{G) is an 
induction on |x|. We give the proof here in full, but it is long and technical. 


Proof. 1. x € C{G) =$■ x balanced: 

We will show that if a € ( N U E)*, and S —a, then a satisfies (i) and (ii) 

G 

from Definition 3.7. We do this by induction on the length of this derivation. 
Basis: 

If S —> a, then a = S. But S contains no occurrence of [ or ], and so satisfies 
G 

(i) and (ii). 

Induction: 

Suppose our assumption holds for all k < n, and suppose S a. Then we 

G 

have [5 such that 


S 



P 



a 


By induction, we have that ft satisfies (i) and (ii). Now, there are three possible 
productions that we could apply to (5 to get a; we show that each preserves (i) 


16 If you ever want your C++ code to compile. 
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and (ii). 

If we applied a production of the form S —> SS , or S —> e, then we are automat¬ 
ically done; neither of these productions changes the ordering of parentheses. 
In particular, we have (for either case) some pair fa €E (AT U £)* such that 


/3 = P 1 SP 2 and a 


/ 3 i /?2 if we applied 5 —>• e 
/3 1 Sftf if we applied S SS 


In either case, a satisfies (i) and (ii), as /3 does. 

So suppose that instead we applied the production S —>• [5]. Then there exist 
/?!,/?2 € (IV U £)* such that 

(5 = /?iS/3 2 and a = /?i[S']/3 2 

Then clearly L(ot) = L(/3) + 1 = Rtf) + 1 = Rtf), as (i) holds in /3 by the 
induction hypothesis, and so holds in a also. 

Now, to show (ii) holds in a, take any prefix y of a. We consider all the sub 
cases: 

( 1 ) y is a prefix of /?i, in which case y is a prefix of /?, so (ii) holds for y by 
the induction hypothesis. 

(2) y is a prefix of f3[S, but not of /?i, in which case: 

Ltf) = Ltftf + 1 > Rtf 1 ) + 1 > Rtf 1 ) = Rtf) 

(3) y = /3i[S']<5 where 8 is a prefix of /%, in which case 

Ltf) = LtfiSS) + 1 > RtfiSS) + 1 = Rtf) 

In all these subcases, we have Ltf) > Rtf), and so (ii) holds in a. So we’re 
done. 

2 . x is balanced =>€ C(G): 

We do this by induction on |x|. 

Basis: 

If |x| = 0, then x = e (which is balanced) and so can be formed from S by the 
single production S —>• e. 

Induction: 

We break this up into two cases: 

Case a): there exists a proper prefix y of x (i.e., 0 < \y\ < |x|) satisfying (i) and 
(ii). In this case, we have x = yz for some z, with 0 < \z\ < |x|, and z satisfies 
(i) and (ii) as well, as 

Ltf) = Ltf) - Ltf) = Rtf) - Rtf) = Rtf) 

and for any prefix w of 2 we have 

L{w) = Ltfw) - Ltf) 

> Rtfw) — Rtf) since yw is a prefix of x and Ltf) = Rtf) 

= R(w) 


By induction, we thus have that y,z € C(G) (that is, S —A y and S -Ltf z ). 

G G 

So we can derive x from S by first applying the derivation S —>• SS, and then 
using the above derivations, via 



SS 
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Case b): no such prefix y exists. 

Thus x = [z\ (exercise: check this), for some z satisfying (i) and (ii). z satisfies 
(i) as we have 

L(z ) = L{x) — 1 = R(x) — 1 = R(z) 
and z satisfies (ii) since, for every non-empty prefix u of z, we have 
L(u) — R{u ) = L([u) — 1 — R([u) > 0 


(since L([u) — R([u) > 1 as we are in case b) ). By the induction hypothesis, 

S ——>■ z. Combining this derivation with the single production S —>• [5], we get 
G 

the following derivation of x: 


S -4 [S]S ~^[z\=x 

(jr (_t 


So every balanced word can be derived. 


□ 


3.2. The Chomsky normal form. 

We give a way of converting any CFG to one of a particular form, called a 
Chomsky 17 normal form. All our productions will have a particular format, 
and this will come in handy for later proofs as we can assume that our CFG is 
structured in a way that is easy to work with. 

Definition 3.9 (Chomsky normal form). 

A CFG G = (N, E,P, S) is said to be in Chomsky normal form (CNF) if all 
productions are of the form 

A BC or A —>• a 
where A,B,C € N and a € E. 

Example 3 . 10 . The following CFG is in Chomsky normal form: 

S -» AB | AC | SS, C ^ SB, A->[, B —> ] 

Later, we will see that this CFG gives the CFL of all balanced parentheses from 
Definition 3.7. 

Observe that no CFG in CNF can generate the empty word e. We will now 
show that this is the only limitation of this normal form, in the sense that every 
CFG has at least one corresponding CNF which generates the same language 
(minus the empty word e). 

Definition 3.11 (e- and unit productions). 

Let G = (N, E, P, S ) be a CFG. 

(1) An e-production is a production of the form A —>• e, for some A £ N. 

(2) A unit production is a production of the form A —>• B, for some 
N. 

e- and unit productions are a hindrance to finding CNF’s for CFG’s. We show 
that we can always modify a CFG so as to remove all e- and unit productions, 
and still end up with exactly the same CFL (minus {e}). Recall that, for sets 
A, B we write the set difference A — B to denote the set (AU B)\ B. 


17 Named after the linguist, logician, philosopher and political activist Noam Chomsky, 
whom you may have seen on the news (but probably not for his mathematical work). 
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Theorem 3.12 (Removal of e- and unit productions). 

Let G = (IV, E, P, S) be a CFG. Then we can construct a CFG G' , with no e- 
or unit productions, for which C{G') = C(G) — {e}. 

Proof. Let P be the smallest set of productions containing P and closed under 
the following rules: 

(1) If —>• aB/3 and B —>• e are in P, then A —>• a(3 is in P. 

(2) A -A- B and B -A- 7 are in P, then A -A 7 is in P. 

We can construct P inductively: let Pq = P and in each iteration we take Pi 
and form P l+ \ by adding all the productions needed to satisfy (1), (2) above 
for Pi. As P is finite, and as the right hand side of each added production is 
no longer than the right hand side of an existing production, we get that this 
series eventually stabilises in finitely many iterations (i.e., P n = P n +i for some 
n). When this occurs, there is nothing to add, so we set P := P n . 

Now define a new CFG G := (N , E, P, S ). Since PCP, every derivation of 
G is a derivation of G , and so C(G) C C(G). But now we can conclude that 
C(G) = £(G), as every new production in Pj+i not in Pi can be simulated by 
two productions of the (from (1) or (2) ) in Pj. Thus we can ‘pull back’ (though 
the Pf s) any derivation in G to a (possibly longer) derivation in G. 

We now show that we can remove all the e- and unit productions from P, 
and not change the language of the CFG (apart from removing {e}). 

Take any w G E* with w p e, and consider a minimal length derivation 

S —A w. Assume that an e-production B -A e is used in this derivation, and so 
G 


S 




w 


At least one of 7 , 5 is not e, otherwise w would be. So that particular occurrence 
of B must have appeared earlier in the derivation as a production A -A aBf3, 
and so we have that our minimal-length derivation looks like 


S ^4 pAO -4 paB/36 -^4 -fBS 7 <5 A w 
G G G G G 


for some m, n, k > 0. But by rule (1), the production A —>• a/3 is also in P, and 
so we have a strictly shorter derivation 

S —4 i]Ad —4 papO 

G G G G 

contradicting the minimality of our original derivation. So we can ‘discard’ all 
e-productions from P, and not change C(G). 

Similarly, now assume that a unit production A —>• B is used in this minimal- 
length derivation of w, say 

S —4 aA(3 - 4 - aBf3 —4 w 

G G G 

Eventually, that particular occurrence of B will be replaced with some produc¬ 
tion B —>• 7 (as B is non-terminal). So we have the derivation 
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for some m,n , k > 0. But by rule (2), the production A —»• 7 is also in P, and 
so we have a strictly shorter derivation 



G 





10 


contradicting the minimality of our original derivation. So we can ‘discard’ all 
unit productions from P, and not change C(G). 

So, by discarding all e- and unit productions from P. we end up with a CFG G' 
with no e- or unit productions, for which C(G') = C{G) — {e} = C(G) — {e}. □ 


Lemma 3.13. 

Let G = (TV, E, P , S ) 6e a CFG. Then we can construct a CFG G' = (TV', E, P 7 , S') 
with the same terminals S, with C-(G') = C(G), and in which every production 
is of the form 

A —>• a or .A —»• Pi • • • P^, k > 1 
/or some A, Bi... B^ E N and a E £ U {e}. 

Proof. For each terminal a E S, we add to IV a new nonterminal A a (distinct 
from all the existing nonterminals) and a production A a —>• a. Call this new set 
of nonterminals TV 7 . Now replace all occurrences of a on the right hand side of 
productions in P with A a , except productions already of the form B -A a. Call 
the new set of productions P' ; then all productions in this set are of one of two 
forms: 

A —>• a or A —> B\ ■ ■ ■ P&, k > 1 

for some A, B±... B / G TV 7 and a G E U {e}. Now set G := (TV 7 , E, P 7 , S'); we 
show that £(G 7 ) = C(G). 

To see that C(G) C £(G 7 ), observe that a production from P with terminals 
and non-terminals in the right hand side can be reproduced by first applying 
the corresponding production in P 7 with only nonterminals in the right hand 
side, and then applying productions of the form A —>• o to recover the terminals. 

To see that C(G') C C(G), observe that derivations of sentences with P 7 can 
be simulated by P. □ 

Theorem 3.14 (Realising Chomsky normal form). 

From a CFG G = (TV, E,P, S) we can construct an associated CFG Gchom in 
CNF such that 

C(G Chora ) = C(G) - {e} 

Proof. Take G and construct the CFG G' = (TV, E,P 7 ,S) from Theorem 3.12 
with no e- and unit productions for which C(G’) = C(G) — {e}. Now apply the 
construction of Lemma 3.13 to G' , to get a CFG G" = (TV 7 , E, P", S) whose 
productions are all of the form 

A —>• a or A —>• B\ ■ ■ ■ B^, k > 2 

for some A, B\ ... B^ € N' and a G E. Observe that, since P 7 has no e- or unit 
productions, then we have a / t and k > 1 (this comes from the construction 
in Lemma 3.13). 

Now, in P 77 , for any production of the form 

A —»• Pi • • • Pfc 

with k > 3 (and thus with each P* nonterminal), we introduce a new nontermi¬ 
nal C and replace the original production with the following two productions: 

A —>■ Pi C and C —» P 2 • • • Pfc 
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Keep re-applying the above step until the right hand side of all productions are 
of length at most 2. Call the resulting set of nonterminals N", and the resulting 
set of productions P'". Then it is immediate that Gchom := (N”, £, P'", S) is 
in CNF, and moreover by the discussion above we have that 

£(G C hom) = £( G ") = £{G') = G(G) - {e} 

□ 


Example 3 . 15 . Take the following CFG from Example 3.8: 

S ^ [S] \ SS \ e 

which gives the language of all balanced parentheses [ ]. We first apply the 
construction of Theorem 3.12 to remove all e- and unit productions to get the 
CFG 

S^ [S] \ SS \ [} 

(the only e- production was S —> e, which we replaced with S -A [ ]. Also, there 
were no unit productions). This CFG generates the language of all non-empty 
balanced parentheses. 

Now we follow the construction of Theorem 3.14 to build a CFG in CNF for 
this language. First, we add nonterminals A, B and replace the above produc¬ 
tions with 

S -A ASB | SS | AB, A -A [ , B -a } 

Finally, we add a new nonterminal C and replace S -A ASB with S -A AC and 
C —>■ SB. So we have the CFG 

S -A AC | SS | AB, C -A SB, A -A [ , B -a ] 

which is in CNF and generates the language of all non-empty balanced paren¬ 
theses. 

3.3. Parse trees and the pumping lemma for context-free languages. 

We will prove a pumping lemma for CFL’s, similar in idea to the pumping 
lemma for regular languages. Before we do this, we need a few new ideas. The 
first of these helps us understand how a derivation is applied in a CFG to give 
a string of terminals. 

Definition 3.16 (Parse trees). 

Let G be a CFG. A parse tree (or derivation tree) for a word w G T(G) is a 
tree representing all the productions applied to S in a derivation of the word w. 
That is, a ‘downward’ tree with root (top vertex) S, whose vertices of valence 
> 1 are nonterminals of G, and whose leaves (vertices of valence 1) are all 
terminals and form w when we ‘read from left to right’. 

We build this tree from a derivation S w as follows: 

G 

(1) Place S as the root. 

(2) If S -A ol i is the first production in the derivation, then we add |«i| 
downward branches to S, and label the new leaves from left to right by 
the letters (terminals or nonterminals) of a 

(3) If the next production is X 2 -A « 2 , then we add |ck 2 | downward branches 
to the leaf X 2 , and label the new leaves from left to right by the letters 
(terminals or nonterminals) of 0 : 2 - 

(4) We keep doing this for each production in the derivation. 
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(5) At the end of this process, we will have no more nonterminals as leaves 
in the tree, only terminals. 

We adopt the following ‘drawing convention’: when we go down a level of the 
tree (applying productions to all nonterminals on level n to bring us to level 
n + 1), we may find that some of the vertices on level n were terminals. So as 
to not lose track of them, we add one downward branch for each such terminal, 
and duplicate them at level n + 1 also. Thus, when we finish building the tree 
(say with m levels), we can read off the derived word by just reading level rn 
from left to right. 

The depth of a tree is the number of edges of the longest path from the root 
to a leaf; a tree of depth n will thus have n + 1 levels. 

Observe that, if G is a CFG in CNF, then a parse tree of G will have at most 
2 n symbols at level n (the root is at level 0). This is because the number of 
symbols can at most double in a CNF derivation, and we start with one symbol 
S at level 0. 

Theorem 3.17 (The pumping lemma for CFL’s). 

Let L be a CFL. Then there exists a constant n (depending on L) such that for 
every word z € L with \z\ > n we can break up z into 5 words z = uvwxy such 
that: 

( 1 ) vx / e. 

(2) \vwx\ < n. 

(3) For all k > 0, we have that the word uv k wx k y is also in L. 

Proof. Let G be a CFG for L in CNF (this exists by Theorem 3.14). Take 
n = 2 m+1 , where rn is the number of nonterminals of G. Suppose z G L and 
\z\ > n. By what we said above, any parse tree for z in G must be of depth at 
least m + 1, as level m has at most 2 m symbols. Let 7 be (a) longest possible 
path starting at the root in such a tree. That path must be of length at least 
m + 1 , and so contains at least m + 1 occurrences of nonterminals (only the 
last vertex in the path can be a terminal). As G has only m nonterminals, then 
by the pigeonhole principle there is some nonterminal which occurs twice in 7 . 
Take the first repeated nonterminal X in 7 , when reading from the bottom of 
the tree up to the root. 

Now break z up into substrings uvwxy such that 

(1) re is the string of terminals generated by the lower occurrence of X. 

(2) vwx is the string of terminals generated by the upper occurrence of X. 

Let T be the subtree rooted at the upper occurrence of X, and let t be the 
subtree rooted at the lower occurrence of X. Now we can ‘pump’ in two possible 
ways: 

First way: We can remove the (lower, and thus smaller) subtree t from the 
original tree, and replace it with a copy of the (upper, and thus larger) subtree 
T. This gives us a valid parse tree for the word uv 2 wx 2 y. We can continue 
doing this several times, each time removing t and replacing it with a copy of 
T, to get a valid parse tree for uv l wx' l y for every i > 1. 

Second way: We can remove the (upper, and thus larger) subtree T from the 
original tree, and replace it with a copy of the (lower, and thus smaller) subtree 
t. This gives us a valid parse tree for the word uwy. 

Observe that vx / e; that is, at least one of v, x are non-null, as we have 
taken two occurrences of X at different levels up the path 7 , so one ‘side’ of 
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the upper occurrence of X leads to terminals. This might be the left side (i.e., 
v), or the right side (i.e., x). 

Also, observe that \vwx\ < n, as we chose the first repeated occurrence of a 
nonterminal reading up the path 7 , and so this must happen at height at most 
m+ 1. Since 7 was chosen to be the longest path in the original tree, then the 
subderivation down from this upper occurrence of X gives us a tree of depth 
at most m + 1 , and thus has at most 2 m+1 = n terminals. □ 

We can apply the pumping lemma for CFL’s to show that a language is 
not a CFL, in a similar way to how we apply the pumping lemma for regular 
languages. 

Example 3.18. The set A = {a n b n c n \ n € N} is not a CFL. 

Proof. Suppose A were a CFL. Take n as in the pumping lemma (Lemma 
3.17), and consider the word z = a n b n c n . Regardless of how we decompose 
z = uvwxy with \vwx\ < n, we have that vwx either contains no occurrence 
of a, or contains no occurrence of c (or no occurrence of either). Thus, when 
we pump z to uv k wx k y, there will be at least one letter (a or c) for which the 
number of occurrences of that letter does not change from uvwxy to uv k wx k y. 
However, there will be a different letter for which the number of occurrences of 
that letter does change from uvwxy to uv k wx k y. Thus uv k wx k y cannot be of 
the form a rn b m c m , for any k fi l. □ 

3.4. Nondeterministic pushdown automata. 

In the previous chapter on regular languages, we first defined our languages 
via certain automata, and then gave an algebraic way to generate them via reg¬ 
ular expressions. Here, we have done the reverse: we first defined our languages 
algebraically, and we now give a ‘mechanical’ way to generate them. These ma¬ 
chines work in a very similar way to e-NFA’s, but they have access to a ‘stack’, 
where they can store a finite but unbounded amount of extra information about 
what has happened so far in the computation. 

Definition 3.19 (Nondeterministic pushdown automata). 

A nondeterministic pushdown automaton (NPDA) is a structure 
M = (Q, E, T, 5, qo, -L, F) consisting of the following: 

( 1 ) A finite set of states Q. 

(2) A finite input alphabet E. 

(3) A finite stack alphabet T. 

(4) A transition function 5 : Q x (E U {e}) x (r U {e}) -A V(Q x T*) which 
is total. 

(5) A designated start state qo € Q. 

( 6 ) A designated initial stack symbol _I_€E T. 

(7) A finite set of accept states F C Q. 

Note that our transition function is nondeterministic, and also allows for e- 
transitions. 

The input of an NPDA is any finite string w = 07 ... 07 , £ E*, and a vertical 
stack of symbols which initially contains just _L. The NPDA takes w, reads the 
first symbol 07 whilst ‘in’ the start state qo, evaluates the transition function 
S(q 0 , 07 , _l_) and then simultaneously/nondeterministically does all of the follow¬ 
ing: for each (q, B\ ■ ■ ■ B m ) € 6 (qo, 07 , _L) the NPDA ‘moves to’ the new state 
q , removes (pops) the top symbol _L of the stack and replaces it with ( pushes ) 
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B\ ■ ■ ■ B m , with B m going on to the stack first and thus B i ending up at the 
top of the stack. The NPDA then reads the next symbol 07 of w, and repeats 
the process. This continues for the entire word w. 

Before we define what it means for an NPDA to accept an input, we need a 
way to keep track of the calculation it is performing 18 . 

Definition 3.20 (Configurations of NPDA’s). 

Let M = (Q, X, T, <5, qo, T, F) be an NPDA. A configuration of M is some 
(p,w, 7 ) £ Q x X* x T*, where 

(1) p is the state that the NPDA is currently in. 

( 2 ) w is the part of the input word which remains to be read. 

(3) 7 is the contents of the stack (the leftmost letter being at the top; the 
rightmost at the bottom). 

The start configuration on input w is denoted (t/o, w, _L). 

To see how the NPDA moves from one configuration to another, we define the 

next configuration relation —A: 

M 

If (p, a, A) £ 5(q, 7 ), then for any y £ X* and /3 £ Y we write 

(P, ay, A/3) -A (q, y, 7 /?) 

M 

With this, for any configurations C. D, E we inductively define the following 
notation: 

C^D if C = D. 

M 

C ^A D if there exists E with C —A E and E —A D. 

M M M 

C —A D if there exists n > 0 with C —A D. 

M M 

We can now define acceptance by an NPDA: 

Definition 3.21 (Acceptance by an NPDA). 

An NPDA M = (■Q , X, T, 6 , qo, _L, F ) is said to accept w by final state if 
(qo,w,-L) -^A (q,e, 7 ) for some q € F and 7 € r*. M is said to accept w by 

empty stack if (qo,w, _L) —A ( q , e, e) for some q G Q. 

M 

We say M accepts by final state to mean that the language of M is given by all 
words w which M accepts by final state. That is, 

M accepts by final state =7- C(M) := {re £ £* | M accepts w by final state} 

We say M accepts by empty stack to mean that the language of M is given by 
all words w which M accepts by empty stack. That is, 

M accepts by empty stack = 7 - jC(M) := {w £ X* | M accepts w by empty stack} 
We make the following two remarks: 

(1) Because of e-transitions, it is possible for an NPDA to enter an infinite 
loop and never finish reading the input word; it can just keep modifying 
the stack forever, without reading any further symbols of the input 
word. 

18 As we have to worry about the state, the remainder of the tape, and the contents of the 
stack. 
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(2) If the stack ever becomes empty before the entire input word is read, 
then the machine becomes stuck, as there is no transition function to 
apply. 


3.5. Equivalence of acceptance by final state or empty stack. 

It turns out that NPDA’s which accept by final state have exactly the same 
computational power as those which accept by empty stack. 


Definition 3.22 (Constructing an NPDA where acceptance by final state and 
empty stack coincide). 

Given an arbitrary NPDA M which accepts by either final state or empty 
stack, we can construct an NPDA M' with a single accept state for which 
acceptance by final state or empty stack coincide. The construction depends 
slightly on whether M itself accepts by final state or empty stack; we do the 
two constructions together, and point out the places where they differ. 

Let M = ( Q , E, T, 6 , qo, _L, F) be our NPDA that accepts by final state or empty 
stack. Take two new symbols it, t not in Q, and - 1 L a new stack symbol not in 
T. Now define 


( Q if M accepts by empty stack. 
( F if M accepts by final state. 


._f {-LL} if M accepts by empty stack. 

| TlJ {iL} if M accepts by final state. 

Now define the NPDA M' by 

M' = (Qu{M},S,ru{-iL},<y',u,-iL,{t}) 

where we define 5' as an extension of 5 by adding the following: 

(1) <5'(u,e,-u-) := {(<? 0 , _L-U-)} 

(2) S\q,e,A) :=S(q,e,A)U{(t,A)}, Vq G G, MA € A 

(3) 6 '(t,e,A) :={(t,e)}, VA £ T U {-U-} 

and 5' := 5 for all other inputs. 

So our new automaton M' has a new start state it, a new initial stack symbol 
-iL, and a new single final state t. It computes as follows: 

(1) In its first computational step, it pushes the old initial stack symbol T 
on top of -LL (via an e-transition), then enters the old start state t/o- 

(2) It then runs precisely like M, as it has all the transitions of M, and has 
T on top of its stack. 

(3) At some point it might enter state f; its accept state. If it does, then it 
proceeds to empty its entire stack (via e-transitions). 

(4) The only way M' can empty its stack is if it enters state t; no other 
state allows it to pop -LL. 

(5) Between reaching state t and emptying its stack, M' does not read any 
more of its input word. 

Thus M 1 accepts by empty stack iff it accepts by final state. 

Theorem 3.23 (Accepting by final state and empty stack). 

Take an NPDA M, and construct the NPDA M' as per Definition 3.22. Then 
jC(M') = C(M). 
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Proof. We first show C{M) C 

If M accepts by empty stack, and accepts w, then (go, it,-!-) —> (g, e, e) for 

M 

some n. But then we have that 

and so A/' accepts in. 

If, instead, M accepts by final state, and accepts w, then (go, w, _L) ^ (g, e, 7 ) 

for some n, some g € -F, and some 7 € T*. But then we have that 

(u ’ w ’ X ) if? (Qo,w,±al) -4 (g,e, 7 ^) ^4 (t,e,e) 

and so M' accepts w. 

Thus, in both cases, M' accepts 19 w if M does. So C(M) C C(M'). 

We now show C(M') C C(M): 

Suppose M' accepts w by either mode (finite state or empty stack). Then we 
have that 

iff ( 9 o,w,-L-U-) ^4 (q,y, 74 ^ (t,y, 7 ^) ^ (T e, e) 

for some g € G, 7 € T*. But y = e, since M' can’t read any input symbols 
once it enters state t (the only transitions involving t are e-transitions; (3) from 
Definition 3.22). So by the way M is simulated by M', we have 

(q 0 ,w,±) ^4 (q, e, 7 ) 

Now consider the definitions of G and of A, and the transitions of the form ( 2 ) 
from Definition 3.22 which describe what the first move into state t can be. If 
we try and analyse how the transition (g, e, 7 -U-) -^4 (t, e, 7 JJ-) could come about, 

then we observe the following: 

(1) If M accepts by empty stack, then we must have 7 = 6 . 

(2) If M accepts by final state, then we must have q £ F. 

Either way, the transition (qo,w,A.) -4- (g, e, 7 ) gives that M accepts w. Thus 
jC(M') C jC(M). □ 

3.6. Equivalence of CFL’s and NPDA’s. 

We can now prove that NPDA’s accept precisely the set of CFL’s. We do this 
in two parts. We first show that, from an NPDA, we can construct a CFG with 
the same accepted language. Then we show that the reverse is also possible: 
from a CFG we can construct an NPDA with the same accepted language. 
In both constructions, we see that the object we construct (NPDA or CFG) 
mimics the operation of the object we started with (CFG or NPDA) in some 
controlled way. 

Definition 3.24 (Constructing an NPDA from a CFG). 

Given a CFG G = (TV, S, P, S ), we construct from it an NPDA which accepts 
by empty stack, as follows: 

First, we use Lemma 3.13 to re-write G so that all productions are of the form 

A —y c IJ 1 ... Bf, 

either finite state or empty stack, as these are equivalent for M'. 
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where c € £ U {e} and k > 0. By Lemma 3.13, this new CFG accepts exactly 
the same language, and so we will discard our original CFG and call this new 
one by the same name ( G = (TV, £, P, S) ). 

Now, from G, we construct an NPDA M = ({g}, £, N , 5, q, S, 0), where 

(1) q is the sole state of M. 

(2) £ (the terminals of G ) is the input alphabet of M. 

(3) N (the nonterminals of G) is the stack alphabet of M. 

(4) q is the start state of M. 

(5) S (the start symbol of G) is the initial stack symbol of M. 

(6) 0 is the set of accept states of M (irrelevant, as M accepts by empty 
stack). 

(7) 5, the transition function of M, is defined as follows: for each production 
A -A- cB\ ... Bk in P, we include (q, B\ ... B^) in the set 6 (q, c, A). 

Before we prove various facts about this construction, we need to introduce 
a particular type of derivation, known as a leftmost derivation. 

Definition 3.25 (Leftmost derivation). 

Let G be a CFG. A derivation f3 -^A 7 is said to be a leftmost derivation if 

each production in the derivation is applied to the leftmost nonterminal in the 
sentential form. 


It is immediate that, if a word w lies in C(G) for some CFG G, then we can 

always derive w with a leftmost derivation S —A vj, by swapping the order 

G 

of some of the productions. To see this, draw a parse tree for the derivation, 

then re-order the applications of productions; we still have the same parse tree, 

and thus another derivation S —A w. This idea works because we are dealing 

G 

with context-free grammars; ones in which we replace one nonterminal with 
some other string, and thus the context of this nonterminal (that is, the other 
symbols around it) does not matter. 

The operation of the NPDA M constructed from the CFG G in Definition 
3.24 is closely related to that of G. We will see that leftmost derivations of G 
from S' to a sentence w of terminals correspond to an accepting computation of 
M on input w. More strongly: the sequence of sentential forms in the leftmost 
derivation of w corresponds to the sequence of configurations of M on input w. 
Thus we see that the machine M , and the CFG G, operate in the same way. 


Lemma 3.26 (Operation of the NPDA constructed from a CFG). 

Let G be a CFG, and M the NPDA constructed from it in Definition 3.24. 
Then, for any y,z G £*, any 7 € N*, and any A e N, we have that 


A 


n 



via a leftmost derivation 77 


{q,zy,A) 



(q,y,T) 


Proof. We prove this by induction on n. 
Basis: If n = 0 then 


27 77 A 


z 7 


77 z = e and 7 = A 
«=> ( q,zy,A ) = (q,y, 7 ) 

^ ( q,zy,A ) (q,y, 7 ) 

M 


A 


0 

G 
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Induction (assuming the statement holds for all k < n): 

We break this up into the forward (= 7 ) implication and the reverse ( 4 =) impli¬ 
cation. 


(= 4 : ^ 

Suppose A 44 z'y via leftmost derivation. Suppose B —>• c/3 was the last pro- 
G 

duction applied in this leftmost derivation, where c € SU{e} and /3 € N*. 
Then 

7~L 1 

A —> uBa —> ucBa = z'y 
G G 

where z = uc and 7 = /3a. By induction, as A — 4- uBa, we have that 

G 


{q,ucy,A) -4 (q,cy,Ba) 

But by the definition of 6 for M, we have that (q, j3) € S(q, c, B ), as B —>• c/3 is 
a production of G. So we get 

( q,cy,Ba ) -4 ( q,y,/3a ) 

Combining these, we see that 

( q,zy,A ) = (q,ucy, A) -4 (q,cy,Ba) -4 (q,y,/3a) = (q,y, 7 ) 
and thus 

(q,zy,A) 44 (g,j/, 7 ) 

M 

(<=) = 

Suppose (q,zy,A) (q,y, 7 ). Suppose that (q,c,B) 1-7 (q, /3) was the last 

transition taken, where (</, /3) € 5(q,c,B). Then z = uc for some u € £*, 
7 = /3a for some a € T*, and 

TZ- 1 

( q,ucy,A ) (q,cy,Ba) (q,y,fia) 

By induction, as (q,ucy,A) -4- (q,cy,Ba), we have that A -4 uBa via 

Ad G 

a leftmost derivation in G. Moreover, by construction of M, we have that 
B —y c/3 is a production of G (as (g, /3) € <5(g, c, 2?)). But now we can apply this 
production to the sentential form uBa to get 

7~i 1 

A —-A uBa. — y uc(3a = z'y 
G G 

via leftmost derivation. □ 


Theorem 3.27 (Language of the NPDA constructed from a CFG). 

Let G be a CFG, and M the NPDA constructed from it in Definition 3.24. Then 
C(G) = C(M). 

Proof. Take any word w S S*. The we see that 

w € jC(G) 47 S —4 w by a leftmost derivation 

G 

47 (q, w, S ) -4 (q, e, e) ( Lemma 3.26 ) 

47 w € C{M) ( as M accepts by empty stack ) 


□ 
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We now show that the reverse process is also possible. In fact, we will prac¬ 
tically invert the construction. 

Lemma 3.28 (Constructing a CFG from an NPDA with one state). 

Let M = ({< 7 }, £, T, 5, < 7 , X, 0) be an NPDA with one state which accepts by 
empty stack. Define the CFG G = (r, E, P, _L), where P contains the production 
A —>• cB \... B k for every case where ( q , B\ ... B k ) e S(q, c, A), with c € £U{e}. 
Then £(G) = £(M). 

Proof. This is the exact same argument used in Lemma 3.26 and Theorem 3.27, 
as all reasoning used was bi-directional (<=>). □ 

Of course, there is no immediate reason to assume that every NPDA is equiv¬ 
alent to one that accepts by empty stack and has only one accept state. We 
give a construction of this here, and then prove that our new NPDA accepts 
the same language as our old one. 

Definition 3.29 (NPDA which accepts by empty stack, with 1 accept state). 
Take any NPDA K , and use Definition 3.22 and Theorem 3.23 to convert it to 
an NPDA M = ( Q , £, T, 5, s, _L, {t}) which accepts by final state and by empty 
stack equivalently, has one final state t, and satisfies £ ( M ) = £(K). 

Now, we define the set 

r' := Q x r X Q 

This is our new stack alphabet, and we will use this to ‘simulate’ the action of 
M on the stack of our new NPDA. We write elements of r' as (p A q), where 
p,q E Q and A E T. We now construct our new NPDA M' to be 

M':=({*},£,r', <?,*,(* X t>,0) 

with one state *, where M' accepts by empty stack. 

We define the transition function 5' of M' as follows: for each transition 
(<7o, Bi,.. ■, B k ) € 5(p,c,A ) (where c € E U {e}) we, for all possible choices 
{qi,...,qk} C Q, include (*,(q 0 B x qi)(qi B 2 q 2 ) ■ ■ ■ (qk-i B k q k )) in the set 
ip A q k }). 

Observe that, for k = 0, this reduces to the following: if (qo,e) € 5(p, c, A), 
then we include (*,e) in 5'(*,c, (p A qo)). 

The point of this construction is that the new machine M' will be able to 
scan a word w starting with only (p A q) on its stack and end up with an empty 
stack iff M can start scanning w in state p with only A on its stack and end up 
in state q with an empty stack. 

The idea here is that M’ simulates M, guessing nondeterministically which 
states M will be in at certain future points in the computation, saving those 
guesses on the stack, and then verifying later that those guesses were correct. 
We now prove that these two NPDA’s operate in an analogous manner. 

Lemma 3.30. 

Let M' be the NPDA constructed from M in Definition 3.29. Then 

C P , B k ) -^4 (q, e, e) 

iff there exist qo...., q k such that p = qo, q = q k , and 

(*,w, (q 0 Bi q 1 )(qi B- 2 q 2 ) ■ ■ ■ (q k -1 B k q k )) ^4 (*,e,e) 
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In particular, we have that 

(. V , B) -4 (g, e, e) (*, w, (p B q)) -4 (*, e, e) 

M M' 


Proof. We show this by induction on n. The base case n = 0 is trivial, as both 
sides are then equivalent to the assertion that p = q, w = e, and k = 0. So now 
suppose the assertion is true for all l < n. 

Firstly, assume we have that (p, w, B\ ■ ■ ■ BA —4 (g, e, e). Let (p, c, B\) i-a 

M 

(r, C\ • • • C m ) be the first transition applied, where c € £ U {e} and m > 0. 
Then we have that w = cy and 

(p, w,Bi--- B k ) -4 (r, y,Ci■■■ C m B 2 ■ ■ ■ B k ) 

M 

4 ? ' e ' e) 

By induction, we have that there exist vq, , r m - i, q\,... ,q k such that r = r o, 

q = q k , and 

(*,V,(ro Ci n) m—1 C m qi)(qi B 2 g 2 ) ■ ■ ■ (qk -1 B k q k }) -4 (*, e, e) 
Now, by construction of M 1 , we have that 

(*, 4o Ci ri) • • • (r m _i C m qi)) € 5'(*, c, (p Bi qi)) 


Combining these, we get 


(*,w, (p B i qi)(qi B 2 q 2 ) ■ ■ ■ (q k -i B k q k )) 

l _ 

M' 
n n 

M' 


(*, y, (to Ci n) • • • (r m -1 C m qi)(qi B 2 q 2 ) ■ ■ ■ (q k -i B k q k )) 


Conversely, suppose we have 

(*,w, <9o 4 yi)4i B 2 q 2 )---(q k ~i B k q k )) 44 (*, e, e) 

M' 

So let 

(*, c, (go Bi gi)) i-a (*, (r 0 Ci n) • • • (r m _i C m gi)) 
be the first transition applied, where c£SU {e} and m > 0. Then w = cy and 
we have that 


(*,w, (go B\ gi)(gi B 2 q 2 ) ■ ■ ■ (q k -i B k q k )) 

-4 (*, y, (r 0 Ci n) • • • (r m -i C m gi)(gi B 2 g 2 ) • • • (g/c-i B k q k }) 



By induction, we have that 

(r 0 , y, Ci • • • C m B 2 ■ ■ ■ B k ) -4 (g fc , e, e) 

Also, by construction of M', we have that (?’o, Ci • • • C m ) € <5(go, c, 4) 
Combining these, we see that 


(go, w,Bi--- B k ) (r 0 , y, Ci • • • C m _B 2 • • • 4) 


M 


(gfc,e,e) 


□ 
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Theorem 3.31. Let M' be the NPDA constructed from M in Definition 3.29. 
Then C(M') = C{M) 

Proof. Take w € X*. Then 

w € JC(M') (*,w,(s _L t)) (*,e,e) 

(s, w, _L) —A (t, e, e) ( Lemma 3.30 ) 

<S> w€ C(M) 

□ 

Corollary 3.32. A language L is a CFL iff L = C(M) for some NPDA M. 

We can use this to give a (mechanical) proof that every regular language is a 
CFL. 

Theorem 3.33. Let L be a regular language. Then L is a CFL. 

Proof. Take an e-NFA E with C(E) = L. Then we can re-interpret E as 
an NPDA which accepts by final state, where we just need to introduce one 
dummy stack alphabet symbol _L, and have every transition mimic one from E 
but where we pop _L from the stack and then push it straight back on again. □ 

There are more direct ways of proving that every regular language is a CFL, 
but we have shown that CFL’s are a true ‘generalisation’ by showing that we 
have a more general machine. 



